Home » Blog » Geekery » What Does Wikipedia Want to Be?

What Does Wikipedia Want to Be?

Robert McHenry, former Editor in Chief of Encyclopedia Britannica recently posted an insightful critique of Wikipedia. Titled “The Faith Based Encyclopedia”, McHenry offers a general critique of the model behind Wikipedia, and a case study of a flawed Wikipedia entry, the entry on Alexander Hamilton. McHenry accurately points out that the Wikipedia entry on Hamilton is written in murky, unclear prose, gives Hamilton’s birthdate unambiguously (while there is, historically, existing uncertainty about his actual birthdate), lists two contradictory dates for Hamilton’s resignation as Secretary of the Treasury, and is riddled with typos and grammatical errors. (Subsequent to McHenry’s article, which received widespread attention, the Hamilton article has been extensively edited…)

McHenry uses this specific case to support his general thesis that Wikipedia suffers from being a work by committee, rather than the work of individual scholars, and is the product of a questionable model that invites anyone, regardless of experience, to participate in the creation, modification and editing of an encyclopedia. He is deeply skeptical that this process yields useful, accurate results:

Then comes the crucial and entirely faith-based step:
3. Some unspecified quasi-Darwinian process will assure that those writings and editings by contributors of greatest expertise will survive; articles will eventually reach a steady state that corresponds to the highest degree of accuracy.

One can argue that this “faith-based step” is actually a well-tested and established model. Open Source software comes into being by allowing open creation and modification, and most software appears to improve over time, as bugs are corrected and features added. Why shouldn’t Wikipedia be able to evolve the same way?

McHenry’s case study seems to suggest that evolution has devalued the article, rather than improving it. The 150 edits to the piece have introduced contradictions and muddied the writing, asserts McHenry: “In fact, the earlier versions of the article are better written overall, with fewer murky passages and sophomoric summaries. Contrary to the faith, the article has, in fact, been edited into mediocrity.”

Before concluding that Wikipedia as a whole gets worse as more people work on it, I think it’s worth positing the existence of two Wikipedias. One wikipedia is a mediocre, incomplete, often inaccurate group-written reference work with an encyclopedia inferiority complex. The other is the most useful and interesting specialist reference work available, allowing people to explore selected technical, entertainment and political topics in depths not available in any other reference work, on- or off-line.

I’ve started thinking of this as the “GSM versus Ghana” problem. When I use Wikipedia to research technical topics, I generally have a positive experience, frequently finding information I would be unlikely to find in any other context, generally resolving my technical questions – “How does the GSM cellphone standard work?” with a single search. When I use Wikipedia to obtain information that I could find in a conventional encyclopedia, I often have a terrible experience, encountering articles that are unsatisfying at best and useless at worst. Generally, these experiences result from a search where I already know a little about a topic and am looking for additional, specific information, usually when I’m researching a city or a nation to provide context for a blog entry. My current operating hypothesis? Wikipedia is a fantastic reference work for stuff that doesn’t exist in other reference works, and a lousy knock-off of existing works when they do exist.

I would love to move from the realm of case study and anecdote into the world of quantitative analysis and try to test this hypothesis. A possible experiment: take the table of contents of an established reference work, like Britannica. Compare it to the TOC for Wikipedia. My guess is that you’ll find a venn diagram where there is a small number of topics covered by EB not covered by Wikipedia, a reasonably large section where EB and Wikipedia both have coverage… and an enormous area where Wikipedia has coverage and EB does not. Focus in on that area where both Wikipedia and EB have coverage and I suspect you will find that EB articles have a larger median byte count than Wikipedia articles. (I realize that “longer” doesn’t necessarily mean “better”, and I’m open to other single-factor quality metrics, if anyone has one to propose. Or we could do a source-blind sampling experiment, where people were asked to read two versions of an article, with no information revealing the article source, and tell us which they thought was of higher quality.) Anyone know where I could get XML versions of the TOCs of EB (or even Encarta and Wikipedia so I could try this out?

My (partial) explanation for why Wikipedia is better when you’re searching for GSM than when you’re searching for Ghana has been to suggest that systemic biases are a neccesary result of peer production. When the contributors to a system have a great deal of interest in and knowledge about technical topics, you’ll get great articles on technical topics and few articles on non-technical topics. I’ve suggested that the only way Wikipedia will be able to cover certain topics – political events in Africa, for instance – will be to radically expand its base of contributors.

McHenry’s article offers another partial explanation: Wikipedia’s bad on non-technical topics because it’s easy for any individual – regardless of knowledge of or passion for the subject at hand – to pitch in. Open a paper encyclopedia, paraphrase the entry on Alexander Hamilton and you’ve done a “service” to the Wikipedia community. And it is a service, of a sort – the existence of an open, copyright-free encyclopedia is a useful thing for people who don’t have access to existing electronic or paper encyclopedias.

But Wikipedia, at its best, can be much more than an open-licensed rip-off of Encarta. Many of the technical articles I’ve encountered on Wikipedia appear to be written by practicioners in their fields, and the changes made to the articles don’t appear to have dumbed down the general high quality of the text. Not only are these articles useful, they’re generally more useful than any other existing references. Perhaps the difficulty of finding information on these topics in conventional reference materials prevents too many contributors from spoiling the soup.

I was talking with my friend Samuel Klein, a passionate Wikipedia supporter and contributor, about how Wikipedia can get over its encyclopedia complex. (Sam and I had a useful near-argument about Wikipedia and Africa a couple of weeks back – my response to Sam’s post and his to mine are in his comments…) Sam believes that the key is to allow anyone knowledgeable about a topic to add a small piece of information to an existing piece with a minimum of effort, allowing Wikipedia participation to be something someone does on a whim, with a free ten minutes, rather than a major investment in learning a new system. Thinking about the ten minute contributions African friends could make to Wikipedia’s consistently terrible articles (most clear rip-offs of CIA world factbook articles) on African nations, cities and politics, I found myself wondering whether some of the research I’ve been doing on Overture could be useful to the Wikipedians.

The OverCluster tool I wrote recently lets you throw a set of search terms at Overture and see what words people search for in conjunction with those terms. Starting with a set of nations as search terms (Afghanistan, Albania, Algeria, Angola, etc…), the resulting clusters look like the rough outline for what a Wikipedia article on a nation could look like. The top twenty associated search terms for my set of 187 nations are: “news”, “hotel”, “flag”, “travel”, “map (of)”, “culture”, “capital of”, “picture (of)”, “weather”, “food”, “history (of)”, “tourism”, “government”, “newspaper”, “photo”, “embassy”, “visa”, “music”, “holidays”, “tour”.

In other words, Internet users are interested in some of what a conventional encyclopedia tells you about a nation – its history, its flag, the structure of its government. But they’re also interested time sensitive information that’s hard for a traditional encyclopedia to offer (news, weather), travel information rarely found in encyclopedias (hotel, travel, visa, tourism, tour) and the details of daily life (music, food, photo, newspaper, holidays) that encyclopedias generally don’t cover.

If Wikipedia were willing to back away from the encyclopedia paradigm and explore the idea of what an Internet reference material could look like, a Wikipedia article on Ghana might feature headlines from Ghanaian newspapers, the current weather in Accra, photos, music and video samples, descriptions of cuisine and travel tips. These cultural and daily life pieces are the sorts of information people familiar with a nation are able to quickly and easily add… while the precise details of a nation’s governmental functioning generally isn’t. The hypertextual nature of the medium means that a “rich” wikipedia article like the one I’m proposing could be turned into a “conventional” article with a single click for those who prefer the old encyclopedia paradigm. It would be more inviting for contributors and more useful for Internet searchers.

(While I’m obsessed with Wikipedia’s coverage of countries and cities, OverCluster could be a useful tool for Wikipedians working on other issues as well. What should articles on computer languages look like? I don’t know – throw a list of twenty or thirty names of computer lanuages at OverCluster and see what clusters of searches emerge. Those clusters will often function as useful subheadings within an article.)

I’m going to have the chance to meet with Jimbo Wales, Wikipedia’s founder, in a couple of weeks. My main question for him: “What does Wikipedia want to be?” Is Wikipedia about unlocking knowledge and recreating EB or Expedia without copyright? If so, I’m not that intersted. But if it’s about figuring out what it means to be a reference material in the Internet age, it’s not just an interesting project – it’s one of two or three of the most interesting Internet projects.

Home » Blog » Geekery » What Does Wikipedia Want to Be?

What does Wikipedia Want to Be?

Robert McHenry, former Editor in Chief of Encyclopedia Britannica recently posted an insightful critique of Wikipedia. Titled “The Faith Based Encyclopedia”, McHenry offers a general critique of the model behind Wikipedia, and a case study of a flawed Wikipedia entry, the entry on Alexander Hamilton. McHenry accurately points out that the Wikipedia entry on Hamilton is written in murky, unclear prose, gives Hamilton’s birthdate unambiguously (while there is, historically, existing uncertainty about his actual birthdate), lists two contradictory dates for Hamilton’s resignation as Secretary of the Treasury, and is riddled with typos and grammatical errors. (Subsequent to McHenry’s article, which received widespread attention, the Hamilton article has been extensively edited…)

McHenry uses this specific case to support his general thesis that Wikipedia suffers from being a work by committee, rather than the work of individual scholars, and is the product of a questionable model that invites anyone, regardless of experience, to participate in the creation, modification and editing of an encyclopedia. He is deeply skeptical that this process yields useful, accurate results:

Then comes the crucial and entirely faith-based step:

3. Some unspecified quasi-Darwinian process will assure that those writings and editings by contributors of greatest expertise will survive; articles will eventually reach a steady state that corresponds to the highest degree of accuracy.

One can argue that this “faith-based step” is actually a well-tested and established model. Open Source software comes into being by allowing open creation and modification, and most software appears to improve over time, as bugs are corrected and features added. Why shouldn’t Wikipedia be able to evolve the same way?

McHenry’s case study seems to suggest that evolution has devalued the article, rather than improving it. The 150 edits to the piece have introduced contradictions and muddied the writing, asserts McHenry: “In fact, the earlier versions of the article are better written overall, with fewer murky passages and sophomoric summaries. Contrary to the faith, the article has, in fact, been edited into mediocrity.”

Before concluding that Wikipedia as a whole gets worse as more people work on it, I think it’s worth positing the existence of two Wikipedias. One wikipedia is a mediocre, incomplete, often inaccurate group-written reference work with an encyclopedia inferiority complex. The other is the most useful and interesting specialist reference work available, allowing people to explore selected technical, entertainment and political topics in depths not available in any other reference work, on- or off-line.

I’ve started thinking of this as the “GSM versus Ghana” problem. When I use Wikipedia to research technical topics, I generally have a positive experience, frequently finding information I would be unlikely to find in any other context, generally resolving my technical questions – “How does the GSM cellphone standard work?” with a single search. When I use Wikipedia to obtain information that I could find in a conventional encyclopedia, I often have a terrible experience, encountering articles that are unsatisfying at best and useless at worst. Generally, these experiences result from a search where I already know a little about a topic and am looking for additional, specific information, usually when I’m researching a city or a nation to provide context for a blog entry. My current operating hypothesis? Wikipedia is a fantastic reference work for stuff that doesn’t exist in other reference works, and a lousy knock-off of existing works when they do exist.

I would love to move from the realm of case study and anecdote into the world of quantitative analysis and try to test this hypothesis. A possible experiment: take the table of contents of an established reference work, like Britannica. Compare it to the TOC for Wikipedia. My guess is that you’ll find a venn diagram where there is a small number of topics covered by EB not covered by Wikipedia, a reasonably large section where EB and Wikipedia both have coverage… and an enormous area where Wikipedia has coverage and EB does not. Focus in on that area where both Wikipedia and EB have coverage and I suspect you will find that EB articles have a larger median byte count than Wikipedia articles. (I realize that “longer” doesn’t necessarily mean “better”, and I’m open to other single-factor quality metrics, if anyone has one to propose. Or we could do a source-blind sampling experiment, where people were asked to read two versions of an article, with no information revealing the article source, and tell us which they thought was of higher quality.) Anyone know where I could get XML versions of the TOCs of EB (or even Encarta and Wikipedia so I could try this out?

My (partial) explanation for why Wikipedia is better when you’re searching for GSM than when you’re searching for Ghana has been to suggest that systemic biases are a neccesary result of peer production. When the contributors to a system have a great deal of interest in and knowledge about technical topics, you’ll get great articles on technical topics and few articles on non-technical topics. I’ve suggested that the only way Wikipedia will be able to cover certain topics – political events in Africa, for instance – will be to radically expand its base of contributors.

McHenry’s article offers another partial explanation: Wikipedia’s bad on non-technical topics because it’s easy for any individual – regardless of knowledge of or passion for the subject at hand – to pitch in. Open a paper encyclopedia, paraphrase the entry on Alexander Hamilton and you’ve done a “service” to the Wikipedia community. And it is a service, of a sort – the existence of an open, copyright-free encyclopedia is a useful thing for people who don’t have access to existing electronic or paper encyclopedias.

But Wikipedia, at its best, can be much more than an open-licensed rip-off of Encarta. Many of the technical articles I’ve encountered on Wikipedia appear to be written by practicioners in their fields, and the changes made to the articles don’t appear to have dumbed down the general high quality of the text. Not only are these articles useful, they’re generally more useful than any other existing references. Perhaps the difficulty of finding information on these topics in conventional reference materials prevents too many contributors from spoiling the soup.

I was talking with my friend Samuel Klein, a passionate Wikipedia supporter and contributor, about how Wikipedia can get over its encyclopedia complex. (Sam and I had a useful near-argument about Wikipedia and Africa a couple of weeks back – my response to Sam’s post and his to mine are in his comments…) Sam believes that the key is to allow anyone knowledgeable about a topic to add a small piece of information to an existing piece with a minimum of effort, allowing Wikipedia participation to be something someone does on a whim, with a free ten minutes, rather than a major investment in learning a new system. Thinking about the ten minute contributions African friends could make to Wikipedia’s consistently terrible articles (most clear rip-offs of CIA world factbook articles) on African nations, cities and politics, I found myself wondering whether some of the research I’ve been doing on Overture could be useful to the Wikipedians.

The OverCluster tool I wrote recently lets you throw a set of search terms at Overture and see what words people search for in conjunction with those terms. Starting with a set of nations as search terms (Afghanistan, Albania, Algeria, Angola, etc…), the resulting clusters look like the rough outline for what a Wikipedia article on a nation could look like. The top twenty associated search terms for my set of 187 nations are: “news”, “hotel”, “flag”, “travel”, “map (of)”, “culture”, “capital of”, “picture (of)”, “weather”, “food”, “history (of)”, “tourism”, “government”, “newspaper”, “photo”, “embassy”, “visa”, “music”, “holidays”, “tour”.

In other words, Internet users are interested in some of what a conventional encyclopedia tells you about a nation – its history, its flag, the structure of its government. But they’re also interested time sensitive information that’s hard for a traditional encyclopedia to offer (news, weather), travel information rarely found in encyclopedias (hotel, travel, visa, tourism, tour) and the details of daily life (music, food, photo, newspaper, holidays) that encyclopedias generally don’t cover.

If Wikipedia were willing to back away from the encyclopedia paradigm and explore the idea of what an Internet reference material could look like, a Wikipedia article on Ghana might feature headlines from Ghanaian newspapers, the current weather in Accra, photos, music and video samples, descriptions of cuisine and travel tips. These cultural and daily life pieces are the sorts of information people familiar with a nation are able to quickly and easily add… while the precise details of a nation’s governmental functioning generally isn’t. The hypertextual nature of the medium means that a “rich” wikipedia article like the one I’m proposing could be turned into a “conventional” article with a single click for those who prefer the old encyclopedia paradigm. It would be more inviting for contributors and more useful for Internet searchers.

(While I’m obsessed with Wikipedia’s coverage of countries and cities, OverCluster could be a useful tool for Wikipedians working on other issues as well. What should articles on computer languages look like? I don’t know – throw a list of twenty or thirty names of computer lanuages at OverCluster and see what clusters of searches emerge. Those clusters will often function as useful subheadings within an article.)

I’m going to have the chance to meet with Jimbo Wales, Wikipedia’s founder, in a couple of weeks. My main question for him: “What does Wikipedia want to be?” Is Wikipedia about unlocking knowledge and recreating EB or Expedia without copyright? If so, I’m not that intersted. But if it’s about figuring out what it means to be a reference material in the Internet age, it’s not just an interesting project – it’s one of two or three of the most interesting Internet projects.