Home » Blog » Poptech '09 » Luis von Ahn: CAPTCHAs? My fault…

Luis von Ahn: CAPTCHAs? My fault…

I’m blogging from Camden, Maine, at the wonderful Pop!Tech conference. This year’s a special treat. My wife, the lovely Velveteen Rabbi, and I are team-blogging, trading off posts. You can read her posts on her website, or just read all of ours on the Pop!Tech site, where Michelle Riggen-Ransom has been doing brilliant work thus far. There’s lots of bloggers in the crowd and on twitter – follow the #poptech tag for lots of different perspectives.

Luis von Ahn, professor of computer science at Carnegie Mellon, believes we can make computers better by harnessing the intelligence of humans. Inventor of the CAPTCHA and the reCAPTCHA, a technology that both helps eliminate spam and digitizes books, Luis was recently recognized with a MacArthur fellowship for his groundbreaking and creative work.

“Have you seen those distored squiggly characters on web forms? Do they annoy you? I invented that.”

Luis explains why we need CAPTCHAs on the internet. They prevent scalpers from buying millions of tickets at once through online systems like Ticketmaster, and they slow down spammers from obtaining a near infinity of email accounts. They don’t always work, he explains – an automated system on Yahoo creates distorted letters from random letters. When the system prompted a user with the word “WAIT”, she waited for twenty minutes before giving up. He’s just grateful it didn’t say “RESTART”.

Luis von Ahn. photo by Kris Krüg

Without CAPTCHAs, bad things can happen. He tells us about a Slashdot poll that asked users to vote on the best CS grad school in US. Students at CMU wrote a program to game the poll. MIT fought back, and within a couple of days, the poll needed to be taken down as both schools had more than six million votes.

Spammers need lots of email accounts, since each account tends to be limited to sending a few hundred emails a day. So spammers write programs to harness lots of accounts. CAPTCHAs slow this down. So spammers are now building CAPTCHA sweatshops, hiring humans to type CAPTCHAs in countries where the minimum wage is very low. Luis observes that, well, at least it’s costing spammers something. And it’s creating jobs in the developing world. Pornographers who want email accounts have found out how to do this for free – they ask people to solve the CAPTCHAs they’re confronted with in exchange for free porn.

CAPTCHAs have now been tried in different ways in different countries. Russian CAPTCHAs frequently ask users to solve complex math problems – Luis is astounded that Russian CAPTCHA authors assume that a random user can calculate a limit in a complex algebraic equation. Indian CAPTCHAs sometimes feature circuit designs and ask a user to calculate resistance. And, of course, in the US, blog captures feature tough problems like “What is 1+1?” He points out that all these CAPTCHAs fail, because they’re all pretty easily solveable by computers.

While he’s proud that 200 million CAPTCHAs are typed online everyday, Von Ahn mourns the waste of human time and energy, up to 500 thousand hours a day. So Luis invented the reCAPTCHA, which is used to help in the scanning of books. Scanning books involves taking photos of book pages then using OCR (optical character recognition) to figure out what the words are. In older texts, OCR is quite inexact – he tells us that for books written before 1900s, OCR misses roughly 30% of the words.

reCAPTCHAs present users with two distorted words. The system knows what one is – if you identify it correctly, it assumes you’re probably answering the second one (the order is randomized) to the best of your ability. When a dozen users identify an unknown word the same way, it’s very likely that the recognition is an accurate one. The system now digitizes 45 million words a day, the equivalent of 4 million books a year.

The two word reCAPTCHAs are as efficient as entering in random strings of 6 to 8 word characters, so von Ahn isn’t making us work harder. The texts are coming from the New York Times archives and from the Google book scanning project. Google likes the technology so much that they just acquired reCAPTCHA.

In the question and answer session, von Ahn explains that he’s hoping to use these methods for language translation and image tagging in the near future.

Andrew closes our first session with the announcement of a new initiative. He explains that Pop!Tech has focused on three areas: innovation, social change and science. The new initiative focuses on helping young, working scientists become visible leaders, learning communication and leadership skills to complement their scientific skills. Starting next year, 15 to 20 scientists will be involved in a year-long training program which will may appearances on the Pop!Tech stage. Andrew acknowledges that, in the academy, there’s something of an anti-popularization bias – the role of Pop!Tech is to ensure that scientists continue doing excellent academic work but are simply more skilled at communicating their exploration, knowledge and discovery. This work is supported in a big way by Microsoft, along with National Georgraphic, and is endorsed by the National Science Foundation.