My friend and colleague Hal Roberts is taking on one of the hardest research questions in the space of internet and society – the surveillance of the internet. If surveillance is done in certain ways, it should be invisible to users – a government might be able to read all email by tapping into internet service providers, and this behavior could be entirely invisible. (We’d know about it only through a whistleblower – see Mark Klein and the information he shared about US government wiretaps at AT&T.)
In absence of the sort of information we might obtain from whistleblowers, studying surveillance means studying what it’s possible for someone to watch, not what watching actually takes place. Colleagues of ours have looked closely at internet architectures to determine what’s possible in terms of surveilling the internet. Where are the points where a government or corporation could monitor large amounts of data? How much data can we monitor using current technology? Is it technically feasible to monitor every packet of data coming into a country like China?
For Hal’s talk at Berkman yesterday, the focus was on less speculative surveillance, and more on a more familiar entity: Google. Hal points out that Google owns two advertising systems – Adwords and DoubleClick – which each account for roughly 35% of the online ad market. In visiting 70% of the websites with ads on them – possibly as much as 50% of all web traffic – the vast majority of users are in situations where Google is able to watch their behavior. The question behind Hal’s talk: “Should you care about this?”
When we consider surveillance, Hal tells us we’re usually thinking about Orwellian Big Brother scenarios, shadowy entities like the NSA, the boss reading our work email, or our insurance companies watching how we drive or whether we’re clandestine smokers. These are entites that have power over you, which can control aspects of your life, ranging from your physical liberty to your employment or finances. Google can’t shoot you, fire you or take away your health insurance – why should we care that they’re watching us?
Hal offers two straightforward reasons, and one much more subtle, complicated one:
– Google could lose our data. Credit card companies, government bureaus and other organizations that should know better have lost large amounts of personal data – what if Google’s data protection policies aren’t up to snuff, or someone simply makes a mistake and releases data that should have been kept under lock and key?
– Governments could access the data Google is storing via subpoenas, national security letters or other mechanisms. Hal points out that Viacom was able to access data from Google regarding YouTube videos –
But Hal’s interests focus on a more complex way of considering Google, surveillance and watching. He points out that, in Orwell’s 1984, Big Brother’s Ministry of Love watched you while you watched it, through screens located in every home and office. This, Hal suggests, is an especially sinister vision of a network public sphere, a space where you and your neighbors are all watching each other. This form of surveillance helps explain how slavery persisted in the US – neighbors watched each other to make sure that no one was assisting escaped slaves and receiving rewards for returning stolen “property”.
An even more sophisticated model is the panopticon, as concieved by Jeremy Bentham and memorably analyzed by Michel Fouault. Bentham proposed building a prison where prisoners could be watched at all times by guards. Foucault observed that when you place individuals in a position where they can constantly be watched, people will enforce their own behavior, acting as if they’re under observation at all times. This means that surveillance can exert a force over society without direct violence or physical impact. Hal wonders what surveillance via CCTV is doing to social in Great Britain, where CCTV cameras in public spaces are very common. And he wonders what it means that Google watches what we do online.
Referencing Canadian sociologist David Lyon, Hal talks about social sorting, the ways in which dataspheres change to present different people with different opporunities. If someone managing a datasphere is aware that you have good credit, you might be presented with more opportunities to obtain a credit card or a mortgage. Lyon refers to the profiles that exist in dataspheres as “data doubles” – data dopplegangers which may or may not accurately resemble you, and which govern what opportunities are made available to you. Google, Hal argues, is in an especially powerful position to construct data doubles, and shouldn’t be sharing this data with banks or other entities.
A particular worry with these data doubles is the possibility of the loss of context. For social relationships to flourish, we need context for individual relationships so that one fact doesn’t dominate our understanding of a person. Hal references an email sent by Larry Lessig to an executive at Netscape which Microsoft used to try to demonstrate that he was unfairly biased against the company and could not serve as a special master in the DOJ’s case against Microsoft. It’s not under dispute that Lessig sent the email, but it’s unfair to build a picture of Lessig’s opinions regarding Microsoft from than single fact. Our concerns about privacy and bureacracy center, at least in part, on the idea that facts about our actions end up separated from context by systems that analyze individual points of data, not the picture of a whole, complete, complicated human being.
In Foucault’s understanding of the panopticon, a situation where everyone is surveilling themselves presents an interesting challenge to the prison warden – there’s the ability to control more people, but a loss of control over the mechanism of surveillance. With Google, we face a complicated system where three entities – content publishers, advertisers, search engine users and Google itself, are watching each other, all attempting certain degrees of control and all involved in a complex dance of watching, controlling and reacting. It’s a mistake to conclude that Google is fully in control of the situation – in fact, Google works so well because control has devolved to such a large group of people. These relationships might be surveillance, might be other forms of watching, but bear consideration as we try to understand how these systems work.
Hal talks about the Google Brain – a close integration of Google with everyday activities which enables real-time feedback loops. Searching on Google is fundamentally different from searching in a card catalog, he contends. The rapid iteration and shaping of our search behavior based on feedback from Google turns online search into a difffent activity. At the same time, Google is watching a user search and giving results back based in part on the links the user follows or ignores. Google changes how we search, and we change how Google indexes and presents information through what Hal calls, “the mysterious mechanization of meaning in the google brain.”
We shape how Google organizes information not just by following and not following links Google presents as search results. We also shape information as content providers. The basic logic of Page Rank – the algorithm Sergey Brin and Larry Page pioneered as graduate students at Stanford – is that a search engine should extrapolate from links on webpages to construct a model of user behavior. If lots of users link to the Sumotalk website with links named “sumo”, the search engine should extrapolate that users looking for sumo want to find this page. Google works (in part) by watching how we structure the content of the internet.
But we watch how Google works as well, and sometimes we’re able to take advantage of this behavior in interesting ways. For roughly two years, a search for “miserable failure” would return George W. Bush’s biography. This was the result of a “google bomb”, a coordinated attempt to associate a page with a particular phrase. Knowing that Google looks for phrases linked to a specific page, bloggers made a point of linking “miserable failure” to the destination page. Google was aware of the technique and could have tweaked their results to eliminate the result – instead, they allowed the results to stand for many months, until a change in their “model” meant that other pages ranked higher for a search for the phrase.
(Here’s something I suggested in conversation with Hal afterwards: To the extent that Google’s model is transparent, it gives us some confidence that we understand – at least in general terms – why we’re given one result and not another. But transparent means that it can be gamed, through something like googlebombing. In their ad server, there’s a great incentive to game – anyone who purchases an ad on Google would really like that ad to be the first one a user sees. Google has an incentive to be transparent, as it makes people more likely to use their system; at the same time, a truly transparent system is so easily gamed, it’s not useful. Google has a tough balancing act to play here…)
There’s another complex feedback loop involved with Google’s ad engine. Ads are placed via multiple criteria. Advertisers participate in a community auction for terms – if you’re willing to pay $2 for a click on your ad for the term “sumo”, your ad will get priority over mine. But better ads – as determined by what users click on – rise to the top. Google watches user behavior and re-ranks ads based on their success or failure, so money alone isn’t enough to dominate the ad sweepstakes, a simple feedback system. Hal points out that there’s a third criteria – compliance with Google’s guidelines. These guidelines are so strict, requiring advertisers to be short, simple and, more or less, honest, that Hal contends they’re changing the way advertising works. A perfume ad would generally try to associate perfume with beauty or sex. That doesn’t work on Google – perfume ads focus on facts, offering perfumes at 60% discount from retail, for instance. The marketing function moves from the ad to the content. He’s particularly fascinated that Google tries hard to convince people to follow these guidelines voluntarily, arguing that ad results will be better if the users play by the rules.
Hal’s analysis, as presented yesterday, focuses on watching and control. He’s fascinated by Google in part because so many parties share control. Even in a system like Ad Words, where Google has very strict control over content standards, advertisers have more control than they do in traditional print advertising. They’re invited to watch Google’s performance via a constant stream of data, which lets them evaluate the performance of their ads and the utility of their spending at a much finer grain of control than with virtually any other form of advertising.
The discussion that followed (interrupted?) the talk focused in no small part on the word “surveillance”. Hal uses the term to explain systems like Google because it allows him access to a set of insights from Foucault and others regarding the alienation individuals feel confronted by systems that watch them, control them and aren’t entirely understandable. I find that the term “surveillance” brings me directly to ideas about direct physical control – the ruling party’s police watching you vote and taking you out for a beating if you vote the wrong way. A term that includes everything from tapping the lines of human rights activists to arrest them for treason through watching whether I click result two or three when searching for “sumo” seems like a badly overloaded term.
Harry Lewis has a helpful response to this complaint – he posits a form of watching that would be surveillance if governments did it. This includes some pretty simple commercial behaviors – grocery stores track what we purchase and offer us coupons and promotions based on our behaviors. But when the FBI gets the clever idea of tracking potential terrorists by looking at the sale of falafel and hummus, we get concerned… shortly after we stop laughing. We might choose to worry about Google watching us because it’s easy to posit situations in which Google would be forced to give that information to the US or other governments. You might respond to this by putting up very high legal walls to prevent this information from leaking, or forbid businesses from keeping this data.
But there’s a cost to the latter strategy – this data is what allows corporations to learn and to provide better information. If I can’t monitor the weblogs on Global Voices, I don’t know what stories are driving people to the website. I’m reluctant to fly blind without this information – which might help me produce a better product – because there’s the possibility I could be subpoenda’d for my records.
The problem with surveillance as an intersection of watching and control in the digital age is that you can argue that Google has control over so many aspects of online life that it’s hard to feel comfortable about situations in which they’re simply watching and responding to feedback. In discussions after the talk, Hal expands his criteria and explains that he sees surveillance as a particular intersection of watching with control, consent and context – I look forward to hearing more about these other two axes.
Hal’s still thinking through his work on the topic – as someone giving a lot of talks about a set of issues I haven’t entirely worked through my thinking on, I have great respect for the willingness to put ideas out there and get the (sometimes fierce) feedback offered from Berkfolk. Looking forward to seeing where Hal’s thinking goes on this interesting and important topic.
Pingback: Video Online » Blog Archive » Hal Roberts: Should we worry that Google is watching?
google is watching
Good lord, Ethan, what a great post – you really set my brain on fire.
Ethan, I’m grateful I’m reading this excellent article for free! Good lesson for bloggers.
Pingback: Om het nog even over google te hebben … « Permanent Gecontroleerde Zones
thts a gud read for sure. thanks for sharing such articles.
Ethan, Looking forward for more articles like this.
Really enjoyed reading this article. cheers! Keep Posting