Two weekends ago, I wrote a couple of scripts designed to let me (and anyone else who was interested) study the emergence of memes on Twitter over the course of days or weeks. I built the tools to study use of the #pman tag during the Chisinau protests in Moldova, but colleagues immediately pointed towards other stories they wanted to track, like the #amazonfail campaign. I’ve got high hopes that we’ll be able to say something coherent about how ideas spread on Twitter at some point in the future.
This weekend, I’ve been innundated with emails from friends warning me about precautions I should be taking to protect myself from swine flu. (There are some pretty good wikis emerging, for those who are interested.) And though I’m not especially planning on going out of my way to avoid human (or porcine, for that matter) contact, it’s been pretty amazing to watch Twitter get flooded with flu posts. I searched for “flu” on Twitter, walked away from my machine to get a beer, and came back to the message “5670 results since you started searching”.
It’ll be worth studying the spread of swine flu on Twitter – Evgeny Morozov is already worried that Twitter is spreading panic and misinformation, and it would be interesting to see if we can find correlations between the actual incidence of the disease, or discover whether media hype has a cycle independent of disease cycles. But who can wait for real data? Isn’t it worth figuring out just precisely how much people are freaking out, right now?
So I wrote a cute little script that quickly calculates what percentage of current Twitter traffic includes a particular keyword or tag. It takes advantage of the fact that Twitter sequentially numbers its posts, and includes this information in search results. This means you can retrieve a page of 100 search results and calculate how many tweets it took to get 100 results. That, in turn, lets you calculate what percentage of tweets, recently, contained the term you’re searching for.
Earlier today, I saw levels as high as 1.5% of all tweets mentioning the word or string “flu”. It’s quieted down by this point in the evening. Here’s a recent comparison of flu terms:
1.003 % flu
0.794 % swine
0.171 % swineflu
0.143 % #swineflu
0.055 % #influenza
0.005 % #flu
0.004 % gripa
(#influenza is in there because it’s been the dominant term in Spanish-language flu posts. gripa is there because my friend David Sasaki wondered why people weren’t tweeting about “gripacochina”.)
Just for comparison’s sake, “redsox” shows up in 0.12% of posts, and we’re in the 9th inning of a very good Red Sox game.
Some interesting data in there – looks like I can safely ignore the #flu tag, in favor of #swineflu. And I’d love to figure out what’s the most common ratio between people referring to a phrase in plain text and to people using it as a hashtag. But it’s hard to generalize anything from single data points – the fun is probably running this tool once an hour or so and watching how it trends over time – perhaps I’ll do that tonight.
I’ve got a cute little Perl script that will take an arbitrary number of terms to search for as command line arguments – if anyone wants to turn this into a CGI program, let me know and I’l send you my code. Too tired to write the CGI tonight…
Early detection of pandemics is really, really important in treating them. People talking about the flu might reflect media attention to a new flu but it might also reflect people having the flu.
In February 2009, Google published a letter in Nature on Detecting influenza epidemics using search engine query data.
From the abstract:
“One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.”
I would love for your technique and their technique to get together and rhumba. Importance of data redundancy and all that.
I’m guessing that you are already familiar with this article, but I thought that I would post it here anyway, just in case.
Their technique is orders of magnitude more rigorous than mine, Jonathan. From what I understand, they’re looking at people searching for symptoms – mine just looks for chatter about the flu. Theirs has a good chance of early detection of epidemics in places where the flu isn’t a major news story – mine is simply showing how quickly discussion and worry about swine flu is currently spreading.
The hashtag #gripacochina was a joke of poor taste playing off the doble sentido of cochina. From my observations, what #swineflu reveals is that 1.) worry is more viral than viruses 2.) Americans worry more than Mexicans.
For those who are curious as to what your average Mexican Twitter user is tweeting about these days, some of the most popular are: andresb, lion05, danysaadia, agkamai, and zolliker, and RodrigoMx.
RodrigoMx is a bit of an outlier as he considers himself a source of breaking news, but in general what you’ll see is that Mexican twitterers aren’t terribly concerned with swine flu. On the other hand, if you look at the twitter accounts of expats in Mexico, 90% of their tweets are about swine flu.
Watching the #influenza stream and having just come back from Mexico City, I agree with @oso about Mexicans being rather witty, acerbic, and a lot less concerned about the swine flu. It was explained to me this way: We have had so many crises in this country, what’s another one? We just deal – and get on with it.
I do appreciate all of the work (and Perl coding) you are putting into this and am looking forward to the next iteration of Media Cloud to include Twitter, etc.
I did a bit of analysis on the twitterfall yesterday. I looked for occurrences of #swineflu, then built a directed graph from all the “@” connections. I got about 11,000 tweets in that period.
Some interesting stuff came up. It’s definitely more of an look inside the #swineflu channel, instead of looking Twitter-wide for the patterns, but I was able to see who the major authorities and hubs were as far as what gets replied to and retweeted, plus made some nice (big) GraphViz files.
Anyway, the major authority was “CDC Emergency”–everyone retweeted tweets coming from that account. Evgeny Morozov did a great piece on the Swine Flu, but I think he missed how often info like this gets retweeted, even if the original poster only does so occasionally. Both bad and good info gets echoed around, but CDC Emergency was by far the most highly-ranked authority, if that says anything.
On the flip side, the major hub was tweeter AndrewPWilson, who works for HHS in their social media group and did a bang-up job retweeting sane and informative URLs and replying to the public. I think he really helped keep a lid on the crazy.
I haven’t posted the Python code I used for this yet, but I’ll do so soon, if it’s of interest to you or others.
“Gripe” is flu in a lot of Latin America, and and “gripe porcina” is swine flu.
Pingback: …My heart’s in Accra » More on Twitter. (Moron twitter?)
Pingback: …My heart’s in Accra » Flock, part two - Twitter and the news cycle, perfect together
Pingback: in news survival. at Where They Are
Pingback: wayneandwax.com » Pop Goes the World
Well, it’s now widespread today! The use of Twitter and information travels so fast using it.