Craig Nevill-Manning is Google’s engineering director and senior research scientist. He’s also a very good computer science teacher. In a talk at Idea Festival about what Google has learned about innovation, he offers an excellent introduction to computer science for non-programmers.
His first lesson is “Think Broadly”. He introduces this concept by telling us “computer science is not just programming.” (He writes it as “computer science != programming”, and “computer science <> programming”, just to satisfy the geeks in the crowd.) Then he invites five volunteers to the stage and gives them cards, each showing 1 through 16 dots on one face, blank on the other. He invites them to sort themselves, then to start displaying numbers in binary. As the group counts from 0-31, Craig invites us to notice patterns – the one bit flips each time, the two every other time, etcetera. He calls this approach “Computer Science Unplugged”, which Google is helping develop to teach computer science as early as pre-school.
Lesson three is “use deep technology”. Craig explains how Google does spellchecking, looking at massive sets of data to make intelligent recommendations. A request for information on “kofee” is pretty ambiguous, but a request for “kofee cup” is probably a request for “coffee”, while “kofee annan” is probably a search for Kofi Annan. Showing the numerous misspellings for Britney Spears, Craig remarks on how bad people are at spelling, and on the challenges of clustering two or three terms to make search recommendations.
To make these recommendations, you need to “build for scale”, Craig’s lesson #4. He references Rodney Brooks’s advice on robotics – build systems that are “fast, cheap and out of control”, which can have lots of robots fail, but still accomplish the mission. Craig suggests that computing systems should be “dumb, unreliable, massively parallel, working on lots of data”.
To build interesting systems, you need to build systems many people will use, on a very large scale. These projects aren’t incremental improvements – they’re big leaps into the future. He explains how Google builds these systems using cheap, unreliable PCs, assuming there will be massive failure of hardware and correcting for it with software, using “reliability through replication”. The goal is so that “multiple failures don’t hurt, they only reduce capacity.” And this redundancy is needed for scaling, anyway. (Craig shows us a backup of his slide, assuring us his presentation is fully redundant.)
How does Google maintain these huge racks of servers? The key is velcro – the machines aren’t screwed together, merely held together by velcro, so a dead hard drive can be torn out and replaced easily. He introduces us to the New Zealand version of duct tape – number 8 wire – and shows us “number 8 wire, silicon valley style”, an early version of Google’s servers. They were aluminum trays, lined with cork, each with four motherboards attached to four hard drives. It’s pretty amazing that these systems – as well as disk drives in enclosures made from lego-knockoffs – managed to scale into today’s systems.
Craig’s final lesson is “detect trends” – he points to trends.google.com, and offers his assurance that Google doesn’t create profiles of individual users, but looks at queries in aggregate. He tells the story of a Who Wants to Be a Millionaire contestant who made it to the final question, and used his lifeline to make a Google query – “carol brady maiden name”. He shows us the rank of that query on Google – it spikes when the episode aired in the East Coast, again when it airs on the West Coast and then a tiny spike when it aired in Hawaii. Google’s usage curve shows different patterns in different parts of the world – it’s a smooth workday curve in the US, but there’s a lunchtime siesta in France and Spain.
Craig gets a wide range of audience questions – it’s clear that the inner workings of Google are pretty fascinating to end users. He offers the “star trek” scenario – verbal computing, with natural language processing – as his “next big thing” for Google, on a long timeframe. He answers a question on “how search really works for a non-technical person” with an explanation of spidering, catalog builds, intersecting search terms and relavency rankings. And he addresses the secrets of Google corporate culture, mentioning that free food and free messages don’t hurt, but noting that the real benefit is from giving engineers 20% of their time to work on their independent projects, a process that’s led to products like GMail and Froogle. He also suggests that Google’s management shortage and flat structure is a benefit, which forces engineers to communicate with each other because their managers are too busy to help!