I’m heading to Tanzania in a few weeks for the TED global conference, and I’d like to improve my Swahili before I go. (This wouldn’t be hard, as I only know half a dozen words.) Search for Swahili resources online and you’re bound to find the Kamusi Project, a remarkable online Swahili-English dictionary that’s been built by paid staff and volunteer contributions over the past dozen years. Dr. Martin Benjamin, the chief editor of the project, sees Kamusi as a possible model for “living dictionary” projects to document all African languages. Kamusi is open to contributions from Swahili speakers and scholars all throughout the world, but these contributions are compiled and edited into a fact-checked, well-indexed resource that’s become indispensible for Swahili scholars.
There’s one major problem: Kamusi is broke, and development of the project has slowed as a result. Benjamin and his compatriots are trying to raise money through text ads on the site and sales of a “Swahili clock”, which tells the time in terms of hours after dawn, rather than hours after midnight. But to serve as a comprehensive Swahili resource, and to expand to document thousands of other African languages – or even twenty, which is their intermediate target – Kamusi would require substantial foundation, corporate or academic funding.
It’s an uphill battle to bring African languages onto the Internet. While there are lively communities on Wikipedia preserving European languages like Welsh or Frisian, most of the speakers of minority African languages, like Ewe or Bambara, have little net access and less net expertise. There’s the very real concern that some of these languages may die out before their native speakers start writing online.
Duane Bailey’s work on Translate.org.za helps explain why it’s important to bring languages online. In its post-Apartheid constitution, the Republic of South Africa enshrined 11 official languages. Duane has been working to ensure that South Africans have software, including applications and operating systems, that are in their native languages.
Why? Imagine learning how to use a computer in your second or third language. A native Setswana speaker, learning to use Microsoft Office, has the challenge of learning new software compounded by having to read dialogs and menus in a less familiar language. Educators believe that people learn to read more quickly when learning in their native language – it’s reasonable to believe that new users learning computers would benefit from computers with interfaces in their native tongues. Bailey has had great success localizing Open Office and other open source products into many South African languages, and is now approaching the larger question of building a framework to localize software for as many African languages as possible.
Rich dictionaries are a critical ingredient in building localized software. To write a spellchecker, you need a word list for a language with definitive, proper spelling. To localize the interface and dialogs of a program, translators may need to create new words for concepts that don’t otherwise exist in the language. (It’s certainly possible that concepts like “menu” or “icon” won’t translate neatly in Wolof, for instance.) Creating this new vocabulary requires close study of the existing language to create terms that are sensible, pronounceable and not confusing within the language – a rich dictionary goes a long way towards making that work possible.
There’s a tendency, I think, to believe that the spread of the Internet and the desktop computer is inherently connected to the global spread of the English language. (That was certainly my assumption fifteen years ago as I played with early internet systems.) But we’re starting to discover that this is a fallacy. There are now more blog posts per day in Japanese than in English, and there may be even more Chinese bloggers. (While Technorati does a great job of counting blogs that contact pingservers to let them know about updated blogs, many Chinese blogs don’t use these services and tend to get undercounted.) As I wrote about last week, when a large number of users who speak a particular language come online, they seem to start talking to each other in their native tongue, rather than in a second tongue.
But the slow spread of the Internet in many African nations suggests that it may be a while before Wolof speakers are writing in that language instead of in French. And the smaller the language, the longer it takes to establish a community online… and, generally speaking, the higher the chance that most speakers of the language don’t have regular internet access. Some African languages will not survive in a digital era.
E.O. Wilson’s Encyclopedia of Life project invites the world to help in documenting the rich variety of species in the natural world. The idea behind Benjamin’s work is a bit less audacious, but still incredibly ambitious – document every language on the African continent before it dies out. Species can be lost forever, and with it, possible cures for disease, insights into the history of evolution, critical members of ecosystems. But something is lost when languages die as well – the knowledge held by that community of speakers, much of which may not exist – and sometimes may not be able to be expressed, in other languages.
Maybe it’s too much to ask for a global, participatory encyclopedia of language. But a good start would be helping find people and funders to support projects like Kamusi which are working hard to make sure that Swahili is a language of the future as well as of the past and present.
David Sasaki’s got a great post, in part in response to this post, which points out the Swahili will almost certainly survive in the digital age, given the rich community of Swahili authors online, and that Swahili is likely squashing other smaller languages…