Home » Blog » Africa (older) » Learning about Arabization

Learning about Arabization

I’m dumb about the Middle East.

Despite the ludicrious overfocus of the American media on Israel/Palestine and Iraq, it’s pretty easy as an American to be dumb about the Arab world.

(The perpetually insightful Rebecca Mackinnon brings this up as a critique of my work on media attention, which suggests that media underfocus on certain regions is a major problem. She points out that North Korea, one of her pet topics, is not so much undercovered as poorly covered. More stories on a subject doesn’t mean better coverage, unless that coverage is more substantial, nuanced and multifaceted, which American coverage of North Korea – or the Middle East – rarely is.)

A major reason for my personal stupidity is the fact that I can’t read Arabic. There’s a major, and well-known, translation gap between Arabic and european languages. The 2002 Arab Human Development report points out that five times as many English language books are translated into Greek (spoken by less than 15 million people) every year than into Arabic (spoken by 225 million people.) And I’ve yet to find a free online tool that lets me translate Arabic stories on Al-Hayat, or other independent Arabic newspapers. (Al-Hayat is good enough to provide some of their stories in translation.) This means that it’s hard for me – and for all English-speakers – to get a sense for what the “Arab street” is talking about.

Because I can’t understand what’s going on in Arabic-language sources, I’d guessed that a vast regional dialogue was taking place that North Americans and Europeans were excluded from. I’d assumed that the nations of the Middle East and North Africa were tightly linked by heritage, religion and language into a single cultural block. As I met with activists and intellectuals in Egypt and Jordan this past week, most did their best to disabuse me of that notion… none more so than the Open Source Software developers.

Something I hadn’t understood about Arabic – while written Arabic is a lingua franca (for the obvious reason that the Holy Quran is written in Arabic and is unchanged from the Prophet’s authorship), spoken Arabic varies somewhat from country to country. Egyptians have a hard time understanding Jordanians, who in turn, have a hard time understanding Syrians, and so on.

The matter gets even more complicated once you consider the challenges of creating technical terminology in the context of a classical language. Classical Arabic doesn’t include a term for “hard drive”, for instance – how does the language adapt to allow conversation on these technical topics? (A side note: Icelanders take pride in the fact that their language has evolved so little in the last thousand years that schoolchildren can read millenium-old sagas with little difficulty, as the vocabulary is familiar – consider how unfamiliar Chaucher’s language in the Canterbury Tales looks in comparison. The downside? An apocryphal story suggests that the word for Microsoft’s operating system, and for sub-screens on a computer monitor in Icelandic is the word for an inflated sheep’s bladder, used in ancient Icelandic turf houses as windows.)

Alaa Abd El Fatah, a brilliant geek and a member of EGLUG, the Cairo-based Egyptian Linux Users’ Group, told me about a series of introductions to open source software being developed by EGLUG partners in the colloquial arabic spoken in Egypt, rather than the classical arabic understood throughout the region. It’s more readable and accessible to the Egyptians he’s trying to convert to the Open Source cause, but it’s hard for people outside the region to understand. A human rights activist in Jordan mentioned that the training materials she imported from Morocco led to laughter at the silliness of the word choice, not serious debate on women’s rights. And Raed Neshiewat, a software developer in Amman, mentioned reading an article in a computing journal from Syria and finding it very confusing:

“They were using a term to mean “the case of the computer”. The term probably translates into English as “chassis”. But in Jordan, we use that term to mean, “the body of the car”. So I was trying to figure out why this guy was trying to put a hard drive in the body of his car.”

It gets more complicated. For ideological reasons, Libya and Syria have been resistant to any loan words from English, French or any other European language for any sort of scientific discourse. So they’ve created their own Arabic-derived terminologies for chemistry, physics and computer science…

One approach to solving the language problem is to agree on a common source of terms – Raed suggests that PC World, published in Dubai, is becoming the “stylebook” for Arabic technical discussions. Alaa, an open source geek, is more interested in a grassroots approach – he’s a participant in a project called Arab Eyes, which is trying to Arabize large sets of open source programs, and is maintaining a wiki glossary of arabic computing terms, trying to get the F/OSS communities throughout the region to converge on a single set of technical terms.

The process of Arabizing technical terms happens very quickly. Amina Khairy, who just wrote an article on Egypt’s emerging blogging scene, tells me that there’s already a verb in Egyptian arabic that means “to blog” – “bal’waga”. But most of Egypt’s bloggers are writing in English, perhaps because many of them are expats living in Cairo, but also possibly because they’re looking to reach a global audience. (She mentions that a number of Ethiopian immigrants, working as nannies for wealthy families, are also blogging, in a combination of Arabic another language, probably Amharic.)

A common vocabulary is not the only linguistic problem Arab developers face while localizing software. Alaa points out that many of the current open source geeks are near-completely bilingual (as he is). They often write technical documents in a combination of Arabic and English. While many open source developers are smart enough to realize that Arabic is written right to left instead of left to right, very few are smart enough to smart bidirectional text – text fields that can be left to right or right to left, depending on what text is being entered.

But the real problem is search. Most content management systems that have been localized into Arabic have search functionality that is either deeply compromised or fails entirely. The reason is that Arabic has several diacritic marks that modify alphabetic characters. Each character plus diacritic is represented with a unique unicode character. But effective searches need to strip diacritics and search for any of the variants of a character, not the specific character/diacritic pair. MySQL and Postgres are smart enough to do this for European languages… but not for Arabic. So any CMS built on an open database tends to have no, or poor, search support.

The good news: people like Alaa are on the case, reporting bugs, patching software and trying to ensure that everyone in the Arabic speaking world will be able to use critical pieces of software. But a really comprehensive solution may require some serious investments, like an Arabization usability lab which lets developers figure out whether the software they’re producing makes as much sense in Yemen as it does in Mauritania.