Weblogs and “selective uptake”

As most of my regular readers (all three of you) know, I’m obsessed with what the US media chooses to cover (and not cover). As I’ve accused friends in the media of overfocusing on events that concern wealthy nations at the expense of African, Central Asian and other developing nations, I’ve gotten a consistent response: “That’s what people want to read about.”

Compulsive researcher that I am, I’m interested in figuring out if that’s true. It seems to me that blogs offer one way of figuring out what people are actually interested in – if someone chooses to write about a topic, it reflects a certain degree of interest. Given the large number of people who use blogs to feature news stories they’ve found interesting, to some extent, blogs represent a selective filter of what readers found interesting in the news.

(There’s at least two valid objections to the previous statement. For one thing, I suspect most bloggers write primarily about personal matters, not about the news, so the filtering effect is probably dampened by the small percentage of people who use blogs this way. Second, there’s a small number of people who use blogs to do original reporting – rather than filtering what other journalists are creating, they’re creating their own journalism.)

So how does this selective filter work? Are bloggers more interested in some topics than others? I’m starting to think about experiments to answer that question. The good folks at Intelliseek have given me access to their Blogpulse engine, which has let me see how many mentions a given set of keywords has recieved on the blogs Blogpulse tracks within a set timeframe. The database also includes information on how many blogposts occurred in that timeframe, so I can make reasonable guesses about what percentage of blogs mentioned a phrase in a given time period.

I’m not able to get nearly as rich data from Google News. (Google’s API still covers only the search engine proper. Grr.) But past experience suggests that the vast majority of searches return information from the past 14 days, allowing me to align timeframes with a Blogpulse search. While I can’t guess at what percentage of results these stories represent, I’m able to do a side-by-side comparison of hits. Here’s the results of a couple of searches I’ve run recently…

Term	Subject	Google hits	Blogpulse hitst	Blogpulse %	G vs. B
Wassef Hassoun	Current events	8030	542	0.04%	14.82
Sudan	Current events	11300	1453	0.11%	7.78
Darfur	Current events	6250	583	0.04%	10.72
George Bush	Politics	47100	37323	2.71%	1.26
Dick Cheney	Politics	14800	7120	0.52%	2.08
John Kerry	Politics	45200	16251	1.18%	2.78
John Edwards	Politics	12400	4621	0.34%	2.68
Michael Phelps	Sports	1430	177	0.01%	8.08
MPLS	Tech/Sci	723	51	0.00%	14.18
Cassini	Tech/Sci	4020	926	0.07%	4.34
Firefox	Tech/Sci	384	2676	0.19%	0.14
Michael Jackson	Entertainment/Media	9320	4649	0.34%	2.00
Michael Moore	Entertainment/Media	12400	17298	1.26%	0.72
John Negroponte	Current events	3940	165	0.01%	23.88
Lance Armstrong	Sports	7590	1133	0.08%	6.70
Euro 2004	Sports	35,200	3941	0.29%	8.93
Sharapova	Sports	7020	898	0.07%	7.82

Total hits in period			1377764

The final column – Google versus Blogpulse – is the interesting one, I think. On items that got a lot of attention in mainstream media, but very little attention in the blogosphere, the number is large (very few bloggers seem interested in John Negroponte, the US’s new ambassador to Iraq, while lots of newspapers, especially in the Middle East, are asking interesting questions about his past.) When the number is low, more bloggers are talking about the issue (while there are only a handful of news stories talking about the new Firefox browser, 0.19% of blog posts in the last two weeks mention the software.)

(It would be interesting to know what the ‘equilibrium point’ is between Google and Blogpulse – i.e., at what ratio is a story equally popular in the aggregate news media and in the blogosphere – but to calculate that, I’d need to know the number of entries Google News is tracking, or have a keyword guaranteed to have the same percentage representation across the two sites…)

I’d love to have a list of “top 100 news stories” I could run through this process every day, tracking uptake by bloggers from the mainstream media – anyone have good thoughts on generating this list? It’s kinda the mainstream media version of the Daypop 40… I’d also be grateful for suggestions of interesting, timely (i.e., breaking in the last week) stories to check out and see how Google and Blogpulse cover them.

(This is part of my new “open research” philosophy, where when I don’t know what to do next, I post it on my blog and beg for help. My next post explains why I have at least a modicum of belief that this method actually works…)