Wikimania: Brewster Kahle's Big Goal

Brewster Kahle starts his talk by declaring his goal: “Universal Access to All Knowledge”. He’s stolen a technique from Minsk – choose a big goal so you can always keep moving towards it. It’s good to have a goal too big to ever achieve.

Universal access involves challenging a late 20th century idea – “Information is Property”. Brewster suggests that this is one of the worst ideas of the 20th century, possibly surpassed only by the “domino theory” as bad thinking.

He sees the 1976 expansion of copyright to the author’s life plus 70 years as the key moment for the rise of this idea. The first casualty of this rise was software – unsigned code (as was the practice at the time) being written at MIT as part of the LISP machine project was technically owned by MIT, and was sold to Symbolics. Richard Stallman tried to fork the code to retain a version… and went a little nuts in the process. But the response was open source licensing… but it was also the establishment of new institutions and structures, like the Free Software Foundation.

Lots of people discovered that for-profit efforts have a fatal flaw – they get bought. DejaNews, IMDB, CDDB, WAIS, Netmanage and so on all were companies based on communities, and they’ve been absorbed into large companies and largely dissipated. Now we’re seeing the rise of the technology non-profit – organizations like Apache, which have no full-time employees. Some of these organizations – like the Mozilla Foundation – are making real money. The rise of the technical nonprofit may signal that something has gone really wrong with the over-corporatization of software.

Now open hardware is comning onto the scene. The Internet Archive needed a low cost machine capable of putting a petabyte of spinning storage on the network – ultimately they designed their own, put the architecture and code out and anyone can build their own for about $60,000. OLPC suggests the radical model that the next major laptop could be a nonprofit project – governments may well trust OLPC more because they’re nonprofit than if they were for profit.

So what would be required for a technical non-profit to actually store all knowledge. The largest print library in the US is the Library of Congress, which contains about 26 million books. The books alone in a book are about a megabyte – that means about 26 terabytes of data. Storage isn’t the cost – scanning is. Costs are now at about $10 a book in India, $30 in the US. People don’t like shipping books, so scanning probably needs to happen at home. Brewster’s team has designed and is building high-quality scanners that can be installed in libraries.

The issue’s not the money – Brewster calculates that all the books in the Library of Congress could be scanned and put online for 1.5 times the library’s annual budget – about $750m in total. The real issue is a legal one – the copyright ownership of many books is unclear. Brewster, learning from Jack Valenti, has tried to frame the problem, declaring these to be “orphan works” – “What do we want to do with orphans? Give ’em a home!” Brewster says that, “the way you ask a question in the US is to file a lawsuit.” So Lessig is helping Brewster sue the attorney general to ask the question of whether libraries can scan and distribute orphan works.

(Brewster also talks briefly about Google’s efforts to scan books – he sees this as a competitive effort, and not neccesarily as healthy competition.)

What would it take to digitize all published audio works? There’s about 2-3 million of them, on LP, 78 and CD. We know it’s troublesome to rip all of them and put them online – though we believe we could do it for about $20 to $30 million. So Brewster is beginning with a small group who is willing to have their works shared – rock bands like the Grateful Dead who allow taping of their shows. They’ve put online about 30,000 concert recordings, including 2800 Dead shows. The next frontier are classical recordings in Europe.

How about moving images? In terms of theatrical releases, there have only been about 100-200,000 ever made. But the licensing images are very, very difficult – the focus so far has been on niches like black and white training films, and stop-motion films make with Legos, where the authors are very grateful for Brewster’s storage and bandwidth. Television is a larger challenge – Brewster has build a really, really big Tivo which has been recording 20 TV channels from around the world – BBC, CNN, Russian, Chinese, al Jazeera, Fox News – for several years. So far, they’ve generated a petabyte of data. They’ve released only a week’s worth – the week after 9/11 – which is particularly interesting. Were people dancing in the streets of Palestine after the WTC fell? Check out TV from the Middle East and see – it’s very clear evidence that news comes with a point of view.

Software is pretty easy in storage terms – there’s only about 50,000 applications, most of them rotting on disk. Brewster’s gotten a DMCA exception, which makes breaking copy protection to back it up possible. Now he’s capturing the web, taking bimonthly snapshots since 1996. Now the goal is to make copies, storing one in Alexandria (which now has 1.5 terabytes online) and Amsterdam is planned next. “If we had six or seven of these, maybe I could sleep at night, ” Brewster says.

(Brewster brags that the Wayback Machine does 100 queries a second against a petabyte database, which is a pretty serious DB load. And he mentions that most people don’t use the machine to try to access content they’d otherwise have to pay for – most people use it to find their own old, non-backed up content.)

Brewster ends with some challenges for what other problems he thinks need to be solved:

– Non-profit open networks, free wireless connectivity, which is made possible by owning our own computers, routers and networks

– Open and transparent web search – it’s okay for Google and others to handle the advertising, but other people need to be designing and offering search engines. A project called “Recall” was built by a single woman using the Internet Archive data – there was no business model, so she moved to Google…

– Privacy and anonymity – we need to protect the long American tradition of anonymous publishing. TOR and others are helping.

– Defensive patenting – we need a Defensive Patent License (like the GPL was to copyright) to ensure certain technologies wil be accessible.

– An open textbook system, that’s better than Wikibooks – textbooks are the number one request of people around the world.

– We need an open library, where people can annotate the book catalog

Brewster also asks for Wikipedia to add attribution, which gets just a smattering of applause. People should know where the facts in Wikipedia comes from. He asks us to look at Ted Nelson’s idea of “transclusion” – a pointer structure into the world of knowledge future and past, that he believes Wikipedia could be.

1 thought on “Wikimania: Brewster Kahle’s Big Goal”

Whirl August 9, 2006 at 8:20 pm

Brewster Kahle’s talk reminds me of the article
“Scan this book” by Kevin Kelly:
http://www.nytimes.com/2006/05/14/magazine/14publishing.html?ex=1305259200&en=c07443d368771bb8&ei=5090
(the article was much commented on at the time)

Comments are closed.