Today, the Berkman Center is relaunching Media Cloud, a platform designed to let scholars, journalists and anyone interested in the world of media ask and answer quantitative questions about media attention. For more than a year, we’ve been collecting roughly 50,000 English-language stories a day from 17,000 media sources, including major mainstream media outlets, left and right-leaning American political blogs, as well as from 1000 popular general interest blogs. (For much more about what Media Cloud does and how it does it, please see this post on the system from our lead architect, Hal Roberts.)
We’ve used what we’ve discovered from this data to analyze the differences in coverage of international crises in professional and citizen media and to study the rapid shifts in media attention that have accompanied the flood of breaking news that’s characterized early 2011. In the next weeks, we’ll be publishing some new research that uses Media Cloud to help us understand the structure of professional and citizen media in Russia and in Egypt.
With our relaunch of the site, many of our most powerful tools are now available for your use. We’re hoping Media Cloud proves useful to anyone interested in asking questions about what bloggers and journalists are paying attention to, ignoring, celebrating or condemning.
We hope the tools we’re providing are a complement to amazing efforts like Project for Excellence in Journalism’s News Coverage and New Media indices – we consider their tools the gold standard for understanding what topics are discussed in American media. PEJ works their magic using talented teams of coders, who sample different corners of the media ecosystem to find out what’s being discussed. We use huge data sets, algorithms and automation to give a different picture, one focused on language instead of topic.
At its most basic, Media Cloud gives a picture of what journalists and bloggers and writing about by counting the words used in recent stories. Above is a cloud of language used in our set of political blogs during the week starting on Monday, May 2nd. We can see language about the US raid on Osama bin Laden’s compound, including obvious words like Abbotabad, Bin Laden and raid, as well as words that suggest particular interests within those stories: helicopter, SEALs, intelligence, interrogation, Pakistan. Even with a major story dominating discussion, we see glimpses of other issues, like the US Congress Caregiver’s Act and speculation that Indiana governor Mitch Daniels will enter the Presidential race. You can click each word in the cloud and see what sentences in different blogs contained the term in question, how often it was used, and how that source compared to others.
Comparison is where our tool is most powerful. The cloud above shows the differences between words used in left and right wing blogs during the same time period. We start to see differences in what aspects of the Bin Laden story bloggers focused on. Bloggers on the left used the words “torture” and “waterboarding” while bloggers on the right use “interrogation” and “terrorist”. Other comparisons are less obvious – we see more discussion of debate about releasing raid photos on the right than on the left, and a discussion about expanding the Hyde Amendment (which affects congressional funding for abortion) on the left.
We’re also able to make general statements about the similarity or difference in word usage in these comparisons. While the left and right may both be focused on the raid in Pakistan, the similarity score (near the bottom of the word cloud, towards the right) suggests a larger disparity in agendas than we saw looking at these two sets of media a year ago, when both sides were talking primarily about Arizona’s tightened immigration laws. I’ve been taking an in-depth look at similarity scores to understand how media attention can shift at moments of international crisis, and how the recent, internationally-focused media cycle may differ from the news we often get in the US.
What our tools let you do with Media Cloud are really just the tip of the iceberg. The code behind our system is published under an open source license, so other researchers can build systems to monitor media in other countries and other languages. (We’ve got a system monitoring Russian media and blogs that you’ll hear more about soon.) We are publishing huge sets of data that include information on word frequencies in different stories for researchers who want to analyze American media without collecting their own data. And we’re hoping to collaborate with researchers around the world who’d like to use our tools and data to ask and answer pressing questions about what’s covered and how.
This new release is thanks for the hard work of Hal Roberts, architect of the project, David Larochelle, developer extraordinaire, Zoe Fraade-Blanar, whose skill at interface design has made our work vastly more useable as well as more attractive. Thanks to them and everyone else involved with the Media Cloud project. Hope you’ll check our work out and let us know what you discover.
An amazing tool Ethan, thank you so much! It’s wonderful to be able to consider such a cross-section of media and perspectives.
Pingback: Joho the Blog » News is a wave
Ethan, I have two very different reactions, as a geek and as an activist.
1) Geek – This is incredibly cool. It’s what stuff like Wikia Search should have been. It could be a start of a true Open Search platform, for research into search engines. Thank you all for such an amazing effort.
2) Activist – Do you really need not just a weatherman, but a supercomputer climate model, to know which way the wind is blowing? Sadly, it won’t make a practical difference. By which I mean, for example, over and over, research finds that attention is dominated by an oligarchical A-list that everyone else echoes, which is professionally manipulated, etc. Establishing this to the n’th decimal place hardly changes anything. Has the constant proof of power-laws in blogs and twitter made any impact at all in the evangelism and snow-jobs and hucksters around them?
But again, great technical work.
Pingback: Berkman Center relanza Media Cloud, una plataforma de análisis de la cobertura mediática – Periodismo Ciudadano
Pingback: MEDIA CLOUD | Laconismos
Ethan
Very interested to see this emerge and will track it.
I wonder if it is possible that what Seth F describes as an “oligarchical a-list” is a simple fact of news: people are interested in what other people are interested. There is a reason a “big story” resonates where tens of thousands of small stories stay local or of interest to a much smaller group of people.
We need to be careful about confusing a “natural” pattern of interest and behaviour with something we believe could be directed by other forces. There lies paranoia instead of a simple understanding of herd behaviour and maybe the wisdom of crowds.
The Berkman initiative has some technical and conceptual parallels in the UK-based Media Standards Trust site http://www.churnalism.com which allows anyone to drop in the content of a story and see how frequently its key elements are recycled or derived from common source material like press releases.
Here is how it is described on the MST site:
On churnalism.com you can:
Compare a press release with over 3 million articles published by national newspaper websites, the BBC or Sky News since 2008
See the percentage of a press release cut and pasted into news articles, and the number of characters that overlap
See a press release side-by-side with an image of the article, showing which bits have been copied
Search examples of “churn†saved by other people as well as collected automatically by churnalism.com
Share examples of churn via Twitter and Facebook
Churnalism.com aims to raise public awareness about the amount of PR (public relations) material in the press. The site was inspired by Nick Davies’ book Flat Earth News, in which Davies reported that PR material now finds its way into 54% of news stories. Yet in most cases the connection between journalism and PR is hidden from the public. This is despite the opportunities that now exist to make the connection transparent.
Regards
Peter
Pingback: Mediacloud: chi dice cosa nell’informazione di qualità online « Webcartografie
Pingback: The Berkman Center Relaunches Media Cloud — Library Journal Reviews
Comments are closed.