Berkman fellow Dave Rand conducts social science experiments online, using labor markets like Mechanical Turk. His lunch presentation, titled “The Online Laboratory: Taking Experimental Social Science onto the Internet” largely reports on results from a paper he recently published with John Horton and Richard Zeckhauser. The paper makes the argument that some experiments can be conducted in an online environment as well as in offline environments, and with great savings in time and cost.
To explain why social science experiments are important, Rand invokes cartoonist Randall Munroe. It’s pretty easy to show correlation in the social sciences, but much harder to determine causation. Well-structured experiments allow us to test causation by manipulating a single variable in the environment and seeing what changes. These experiments can be expensive and time consuming to set up – you need to recruit participants and give them incentives. And you need to consider design carefully, to ensure that the subjects of the experiment take the process sufficiently seriously to give valid results. In social psychology, experiment design often tricks people into giving valid results, suggesting one variable is being studied and actually studying something else. Experimental economics tries to ensure compliance through differential incentives – do well on the task and you get more money. Field studies take natural phenomena in the world, then manipulate a variable – as a result, the subjects don’t know they’re in an experiment.
The internet changes the landscape for these sorts of experiments. It’s much easier to recruit and to find many subjects with little effort. Online studies aren’t new to psychologists, Rand tells us, but are relatively new to economists, because earlier experiments had no fiscal incentives. Online labor markets fix this problem – they allow differential rewards without sacrificing cheap and easy recruitment.
Rand’s platform of choice is Mechanical Turk, an online labor marketplace run by Amazon. The name comes from a chess playing automaton popular at the turn of the 19th century. The Turk showed incredible skills at playing chess, and appeared to be the most sophisticated machine intelligence of the time. In truth, there was a small man hiding in the cabinet, manipulating levers and making the decisions to win the chess game. The point: sometimes a person is the right way to solve a problem.
As an experimenter, Mechanical Turk can feel a lot like sending your experimental questions to a robot and receiving the results. Behind the scenes, you’ve got to follow a pretty well established sequence:
– Put money in an account
– Create a HIT (“Human Intelligence Task”) description
– Redirect to your favorite survey site
– Come back to Turk with a confirmation code
– Match the turk file with survey file and calculate bonus
– Upload CSV with approve/reject and bonus info and let Amazon make the payments
While Mechanical Turk is usually used for tasks like image labeling, text and audio transcriptions and collecting information on websites, it works extremely well for many social science experiments, including surveys. Rand makes it clear that he’s not affiliated with Amazon (though he notes he wouldn’t mind an affiliation…) but that he’s simply a huge enthusiast for this new method. The low cost, Amazon’s modest cut (10%) and some careful system set ups (ensuring tasks can be completed only once by a given worker) make the system remarkably helpful for social science experimenters.
One remarkable aspect of Mechanical Turk is how modest the payment is. One of Rand’s research partners has conducted experiments to calculate what Turkers will accept as pay for a given task. They calculate a “reservation wage” – the low point of acceptable pay – with a median value of $1.38 per hour. Rand suggests that the main variable associated with experimenting on MechTurk is your willingness to wait for people to undertake your task – pay more, and the results come in faster. Rand says he usually pays $0.20 to $0.40 as a baseline for a task with bonuses of up to $1 for a five minute task. At that rate, he got 1500 participants in 2.5 days. That’s a remarkable pace of collecting observations.
To answer the question, “Who are the Turkers”, Rand conducted a study and asked Turkers to self-report demographic data. The mode age is around 30, with a higher median – there are some participants over 60. Roughly 50% of his respondents were in India, 35% in the US, and 15% elsewhere. The biasing factor there is English, as the system and task both require English comprehension.
A study by Rand’s colleague John Horton asked about the motivation of workers. Money is the main motivator for most participants, and identical in weight for Indian and US workers. A modest number of Americans list “fun” as their motivator, while few Indians do, while a modest number of Indians list “learning”, while few Americans do. That money is a primary motivator is useful for economists, as they operate on the assumption that people will behave in a way that maximizes their ability to access incentives – as such, a population of people who want to get paid is better for economists than a bunch of folks who are in it for the lulz.
The graph of education levels is a bit depressing – the majority of US participants report some college or a batchelor’s degree. Outside the US, the largest plurality have bachelor’s degrees and almost 30% self-report has having graduate degrees. Income distribution also shows a significant difference between US and non-US participants – 25% of US participants reported themselves as having income under $15,000, while 55% of non-US respondents reported that income level.
Rand notes that he’s frequently asked about sampling bias – how can we draw conclusions about human behavior from the people who participate in Mechanical Turk studies. He responds by pointing out that the vast majority of social science research is conducted testing on US undergraduates, a deeply atypical population. The participants on Mechanical Turk have more variation than the average college student population are are less WEIRD (using a term suggested by Joe Henrich, who worries that much of the social science based on observations of wealthy Westerners draws inappropriate “universal” conclusions from a biased sample set) than the subjects of most social science research. For some experiments – estimating a level of human generosity, for instance – we need representative samples. If the goal is to compare changes across different treatments, a representative sample might be less critical. In both cases, Rand suggests there’s some value in using Mechanical Turk populations, and that Mechanical Turk makes cross cultural work far easier to conduct than in traditional experimental setups.
Another concern is whether the modest payments offered to participants are sufficient to influence economic decision making. In an unpublished paper, he and colleagues compare experiments with dollar stakes and no stakes via Mechanical Turk. They averaged 616 traditional (offline) studies to get a sense for behavior we’d expect to see in these experiments. In one experiment – the Dictator game – the subject is given a sum of money and can choose to give a percentage to another participant. Offline studies suggest that subjects will typically give 35% of the sum – in a Mechanical Turk setting, participants gave 33% of the dollar they would have received as a bonus. When the incentive of real money was removed, and subjects assigned “points”, receiving only a base payment, they were substantially more generous. In other words, the $1 stake was sufficient to differentiate from uncompensated behavior, and online
Pingback: Mechanical Turk and the Limits of Big Data | Etherport Computers & Solutions
Pingback: Mechanical Turk and the Limits of Big Data - IT Solutions+Services
Pingback: Mechanical Turk and the Limits of Big Data | Waynes IT World
Pingback: Mechanical Turk and the Limits of Big Data | deskboy.tk