Back in the days when I used to run “member services” for Tripod, I got interested in steganography, the art and science of concealing messages within other messages – hiding a text file within a JPEG image, for instance, or an image file within an audio file.
Basically, Tripod had an interesting problem which we’d solved surprisingly well. We provided free webhosting space for about 15 million users. Those users each had about a megabyte’s worth of storage space where they could post images to enhance their homepages. (Hey, this was a long time ago. We walked to school uphill in the snow both ways, had only one 1MB of online storage and we liked it.) As you’ve probably guessed, many users decided to enhance their homepages with pornography, which was prohibited under our terms of service.
It became a major technical and business challenge for Tripod to eliminate porn. Porn sites generated a great deal of traffic, which increased our bandwidth costs. And they made our advertisers unhappy, which reduced our revenues. However, we couldn’t just delete the accounts of people we thought might be pornographers – the innocent members who’d been inadvertently deleted would have been up in arms.
So we solved the problem with a combination of automatic and human tools. We used a couple of algorithms to determine which directories likely contained porn (several large image files, heavy traffic, sequentially named files, etc.), then scanned all the images in the selected directories, looking at the color tables of their jpegs. If a certain threshhold percentages of pixels fit into a range we characterized as “flesh tones”, it was probably either a human face or a naked torso. Directories tagged as “probably porn” were manually reviewed by poorly paid interns, who used a tool that displayed all the images in a directory – if those images were pornographic, an intern pressed a button and the user was deleted.
This system got so good that the mean lifetime of a pornographic image on Tripod was about 3 hours. Our bandwidth bills went down, our advertisers didn’t complain, but still the porn kept coming. I started wondering, “Given that we’re going to delete it within six hours, why would anyone bother posting porn on our site?”
Which led me to wonder whether someone was posting these files as a form of communication rather than titilation. If you were trying to communicate through a secret channel with other criminals/terrorists/bored computer science geeks, perhaps embedding text files in nude pictures of Alyssa Milano would be an effective channel for communications. Maybe we were hosting dozens of secret conversations on our site, not just lots of random porn!
Unfortunately for my exciting spy fantasies, steganography researchers were discovering that they couldn’t find evidence that anyone was actually using steganography. Elegant little tools like “stegdetect” could make a pretty good guess at whether a JPEG file had a message appended to or hidden within it. Scanning hundreds of thousands of images from Ebay, Usenet newsgroups and the web at large, researchers finally found the first known example of steganography discovered “in the wild” – an image put together by a geek at ABC television to illustrate a story on steganography. (Hardly evidence that a North Korean-backed terrorism organ-smuggling ring was using stego that we’d all been hoping for.)
Now Keith McDuffee has an application for steganography that just might find a couple of users – backing up text files on Flickr. Keith observes that Flickr gives it’s paid users unlimited storage, only constraining them with a 2GB per month limit on uploads. Rather than burning some DVDs, buying a new USB key or otherwise figuring out a sane way to back up his documents, Keith demonstrates that he can back up his files by embedding them within digital pictures using “steghide”, upload them to Flickr, and extract the documents after the fact.
While this is pretty freaking cool, I’m not planning on backing up my hard drive to my collection of photos of Ghana any time soon. But I do think it’s just a matter of time before we start reading alarmist articles about terrorists using Flickr as a tool for secret communication.
There’s a use of steganography for good that I’ve been thinking about trying to implement. Internet users in some nations have their access to the web filtered by national firewalls. A popular and effective way around these firewalls involves using proxy servers – a user trying to reach a blocked website reaches an unblocked proxy server and asks the proxy server to retrieve the blocked website.
This technique works just fine so long as the proxy isn’t blocked by the censoring government in question. In countries like China where surfing via proxy is very popular, friends like Rebecca tell me that they need to change proxy servers every half-hour or so, as ISPs, under government pressure, block the proxies they’re using.
A number of ideas have been suggested for providing proxy users with the addresses of current, unblocked proxies. Most involve creating new pieces of software that would automatically download information about newly available, unblocked proxy servers. Which is all well and good, but in especially repressive nations, it might be dangerous to have a piece of sofware on one’s computer that had no purpose other than evading a firewall.
My proposed solution: use the comments field in the JPEG header to store the IP address of a new, open, unblocked proxy, as well as a timestamp of when that proxy was valid. We’d autogenerate these headers with an Apache module (possibly a version of mod_negotiation) which would download this information from a central server and embed it into each JPEG served.
Users in the know wouldn’t need anything more complicated than Windows Notepad to open a JPEG and find a valid proxy inside. If enough sites participated in the project, we could end up with a substantial portion of all images on the web carrying this data, and possessing an image that contained information about a useful proxy server could hardly be evidence of intent to evade firewalls (unlike, say, a TOR setup on your laptop… :-) A government wanting to block this information would need to block JPEGs (politically unviable) or edit comments out of JPEGs (a huge technical task, currently beyond even the Chinese firewall).
(Of course, what they’d more likely do is open the headers, register the proxy servers and block them, which might invalidate the whole project.)
My ambivalence over whether or not this is a good idea is reflected in my current name for the idea: Really Stupid Steganography. Would be interested to hear from people who are thinking about filters, firewalls and evasion whether they think it’s really stupid as well, or worth some more thought.
Or perhaps we’ll just have to wait for someone to release an independent film with steganographically hidden digital watermarks in snippets around the web before we see people regularly talking and thinking about steganography… :-)