This evening, Google News tells me that I have my choice of 5,053 articles on conflicts between Congressional Republicans and Democrats over healthcare reform. (Oh goody.) How many of those stories contain original reporting? In a world with thousands of professional media outlets at our fingertips – as well as hundreds of thousands of amateurs – how much original material do we really have access to?
Pew’s annual State of the News Media report made one pass at answering this question in their 2006 edition. They did an exhaustive study across media of May 11, 2005 and concluded that, of the 14,000 stories posted on Google News that day, only 24 unique “news events” were represented. Here’s the quote: “The level of repetition in the 24-hour news cycle is one of the most striking features one finds in examining a day of news. Google News, for instance, offers consumers access to some 14,000 stories from its front page, yet on this day they were actually accounts of the same 24 news events. On cable, just half of the stories monitored across the 12 hours were new. The concept of news cycle is not really obsolete, and the notion of news 24-7 is something of an exaggeration.”
It’s a striking pair of numbers – 14,000 stories, but only 24 actual news events? – but I suspect it’s a bit deceptive. Grab snapshots of an unmodified Google News page today and you’re only going to get a few dozen story-clusters, each containing hundreds or thousands of similar stories. There are many, many more story clusters accessible within Google News… you’re just not going to get them unless you read beyond the front page or customize your front page to track different topics. The 24 news events is an artifact of the front page model, the decision by Google News to present a certain subset of possible stories during the day. It’s a relevant number, because it means that the average user probably won’t see stories outside of that narrow set, but it’s not a fair commentary on the depth of coverage accessible through the site.
What’s interesting in those numbers is the 14,000/24 ratio, implying 583 versions of each story. (That ratio is probably much higher today, with Google News following more news sources.) Jonathan Stray did a very smart analysis for Nieman Journalism Lab, looking at a universe of 800 stories about the alleged involvement of two Chinese universities in hacking attacks on Google. His findings were striking:
800 stories = 121 non-identical stories = 13 stories with original quotes = 7 fully independent stories
Stray coded the 121 non-identical stories that had been clustered together by Google (the clustering algorithms are good, but not perfect – nine stories were unrelated to the specific case of these two universities) and looked for the appearance of novel quotes, which he considered the “bare minimum” standard for original reporting. (Interesting – it’s the same logic that led Jure Leskovec to track quotes to track media flow in MemeTracker.) Only 13 of the stories contained quotes not taken from another media source’s report.
The essence of Stray’s piece is the question, “What were those other 100 reporters doing?” The answer, unfortunately, is that they were rewriting everyone else’s stories. Given the current shortfalls in American journalism, this seems like an almost criminal waste of time. Jeff Jarvis offered the advice, “Cover what you do best, link to the rest”, and Stray’s finding suggests that many outlines haven’t yet embraced this particular piece of wisdom.
I was more struck by Stray’s closing point – that even the mighty New York Times got the story wrong. Two schools are mentioned in the Times’s report, the well-known Shanghai Jiaotong University and the obscure Lanxiang Vocational School. As the Qilu Evening News reported, Lanxiang Vocational school is a school that primarily teaches motor vehicle repair and certifies operators of earth moving equipment – it’s an extremely unlikely hotbed of hacking activity. (Though it’s possible that freelance, nationalist hackers were based out of Lanxiang’s computer lab, Qilu’s account casts serious doubt on reports that a Ukrainian professor was teaching specific hacking courses… in part because no serious computer training is offered at the school.) Perhaps it’s a bit much to expect the Times – though it does have a Shanghai bureau – to be reading a Chinese language newspaper… but as Stray points out, the story had been generously translated by the indispensible Roland Soong and was available on his prominent English-language site.
I’d love to see Google remove or deprioritize those big numbers that run under every story cluster. Yep, they’re useful for visualizing media attention – Newsmap does a beautiful job of portraying what stories Google knows about in visual form using these cluster numbers. But they give an illusion of abundance, where there’s often scarcity. If we knew here were 13 stories, not 800, on the Chinese universities and the Google hacks, perhaps we’d be demanding more access to original reporting. Maybe we’d ask for translation and inclusion of journalism in other languages in these clusters. Maybe we’d become more acutely aware that – in the case of this particular story – the original reporting was done almost exclusively by large newspapers, entities whose ability to do this reporting is increasingly imperiled.