Idea Festival: Dollar bills and epidemeology

I’m here at the Idea Festival in Louisville, Kentucky for the next couple of days. Idea Festival was good enough to invite me to speak last year, and invited me back this year to give a talk alongside my Global Voices colleagues, Amira Al Hussaini and Georgia Popplewell. In the meantime, I’m hanging out with Idea Festival blogger Wayne Hall and blogging as much of the conference as I can.

Dirk Brockmann is a physicist at the Department of Non-Linear Dynamics (or maybe the Dynamics and Self Organization… or maybe the Institute for Fluid Dynamics) at the Max Planck Institute in GÃ¶ttingen, Germany. No one is doing fluid dynamics at Max Planck anymore, but the basic philosophy of the institute is the same – take the methods of theoretical physics and apply them to a very different field. In Brockmann’s case, this means looking at currency circulation as a proxy for human travel, which ends up being a way to study epidemeology.

The data Brockmann uses is from WhereIsGeorge.com, a long-running web experiment that tracks currency. Studying this data statistically gives insight into ways that people travel around the world, which is critical to understand in figuring out how SARS and other diseases spread around the world.

When you think of space, time and disease, Brockmann begins, the first disease most people study is the Black Death pandemic of the 14th century. It started in southern Europe, spread in waves, at a speed of a few kilometers a day, eventually wiping out a quarter of the European population. Waves are familiar to physicists – perhaps the wave equations we understand from light can help us understand this spread? Unfortunately, other diseases – the spread of measles in the UK, for instance – are much more complicated. Complex phenomena are usually the product of multiple forces. To study them, you can build complex models that incorporate the key causes… but these generally fall short, and don’t fully model the systems under consideration.

One technique for building better models is to look at different examples and look for common features – Brockmann shows us a chart of pandemics over the past millenia. For the past few centuries, the major pandemics are influenza pandemics. He invites us to look at H5N1 – bird flu – because it’s so scary: while the disease doesn’t currently affect humans, if it crossed over from an avian population, it would be devestating. HIV/AIDS and tuberculosis are two other pandemics we should be worried about, but tend to ignore because TB, in particular, strikes primarily in developing nations. SARS killed as many people as HIV/AIDS and TB kill every two hours, despite the fact that media attention focused so heavily on SARS.

Estimates for the impact of a pandemic like bird flu were between 2 million and over a hundred million – that’s two orders of magnitude, which implies that a) the press does a lousy job of reporting science numbers and b) there’s great uncertainty about what a real-world pandemic would look like. Brockmann shows us what a disease looks like in a small population – a school or a town – they reach a peak quickly, then drop down gradually. There’s a model – SIR dynamics – in which patients are susceptible, infected then recovered – which goes a long way towards predicting epidemic spread in an isolated group. Key factors in building these models are the mean transmission rate and the mean disease duration. If you take the product of these two, you get a single factor, the “basic reproductive rate”, that characterizes the spread of an epidemic quite neatly.

While we can predict the spread of “compartmentalized” diseases quite neatly, it’s lots harder to predict the transmission of bird flu, because transmission of disease is directly related to our transportation networks. If you want to understand how to build a better model, you have to understand how we travel in space and time. Brockmann shows us a 3D model of air traffic networks, showing links between nodes and the volume of travel. This travel, of course, is very different from travel in the 14th century – we can cross the world in a day, and this means that disease dynamic can be very, very different. While Black Death moved at a couple of kilometers a day, SARS crossed the Pacific Ocean within a few days.

To model SARS, Brockmann began by combining a local disease model with a global travel system model – the results predicted were quite close to what actually happened in the spread of SARS. This is a pretty good indicator that following travel is a great clue to understanding disease spread. But air travel is only one factor – people travel through lots of other methods, including cars, trains, buses, on “all length scales”.

Once Brockmann realized that an accurate disease model would require a much more robust travel model, he found himself somewhat depressed. He mentioned the problem to his friend, cabinetmaker Dennis Derryberry, who suggested WhereIsGeorge, which allows people to track the movement of dollar bills, as a possible proxy for actual US travel. Because the WIG data set is so large, it’s possible to do a great deal of statistical extrapolation from it.

A quick check of report density on WIG to population density shows a pretty clear correlation – people seem to participate in the game from all over the country. Marked bills seem to get injected into the system from all over the US as well. After bills are injected, most are next sighted near their injection point… but a small set are discovered a long way from home, which is consistent with the nature of long-distance travel. The distribution follows an inverse power law – which is a mathematical distribution that physicists know well. Power law distributions are characteristic of data sets that are “scale-free”. To explain “scale-free”, Brockmann suggests we all guess the mean height of an adult human being – our estimates will tend to cluster pretty tightly, probably in a bell curve distribution, because there are no humans one inch tall, and none a mile tall. But if we estimate mean human income, we’ll be all over the map, because there are no scale constraints for the equation.

A dollar bill moving around the world is a “random walk” in mathematical terms. There’s a variety of mathematics that help us understand random walks and scale-free phenomena that might inform how we build models on the spacial spread of disease.

Brockmann shows us a map of Kentucky, with each county represented as a node, and links that show the strength of the ties in the network – how much currency flows from one to another. We can compute the transportation network based on the currency flows in Kentucky, and in the US as a whole. Something that becomes immediately apparent is that the short-distance connections are more relavent than the small-distance ones, confirming the intuition that airtravel data alone is insufficient to build a model.

Currently in vogue in network theory is identifying “communities” in complex networks. Using a moularity function, you can detect communities within the graph of a network, areas where connections to the community are strong and connections to other communities are weak. Doing this in the WIG/transportation network gives us information that can be quite counterintuitive. A map of Europe shows us one type of communities – nations. But those communities may not map neatly to realworld communities. The national boundaries have evolved over time, but they aren’t neccesarily the “effective communities” or Europe. The community analysis of a transport network shows us what the effective divisions might be – lots of people transit from New York to Los Angeles, so perhaps they are functionally the same community.

Doing this analysis on the WIG data divides the US into ten segments. One covers almost all of Texas (except El Paso, which is part of a Southwest cluster); another covers all of New England and the Atlantic states, down to Virginia. They’re very different from the ways we’re used to breaking our nation into regions… but these are the communities that a mathematical analysis of data tells us are created by transportation and trade networks.

The new, richer model built with the WIG data predicts the spread of a disease through the US in a very different fashion than a pure wavefront model, the sort of model we would use to model the Black Death. This model moves very quickly, and it’s fractiline – a disease spreads from one major city to another and then spreads from those urban centers into rural areas.

The final message of Brockmann’s talk: the creator of Where Is George, who is in the audience for the talk, had no intention of creating a tool for scientific research. But it’s possible that this data may be a key set in predicting the spread of disease in the future.