Why global contributions to Wikipedia are so unequal

Critical mass of editors could help solve the puzzle. bastique, CC BY-SA

The geography of knowledge has always been uneven. Some people and places have always been more visible and had more voices than others. But the internet seemed to promise something different: a greater diversity of voices, opinions and narratives from more places. Unfortunately, this has not come to pass in quite the manner some expected it to. Many parts of the world remain invisible or under-represented on important websites and services.

All of this matters because as geographic information becomes increasingly integral to our lives, places that are not represented on platforms like Wikipedia will be absent from many of our understandings of, and interactions with, the world.

Mapping the differences

Until now, there has been no large-scale analysis of the factors that explain the wide geographical spread of online information. This is something we have aimed to address in our research project on the geography of Wikipedia. Our focus areas were the Middle East and North Africa.

Using statistical models of geotagged Wikipedia data, we identified the necessary conditions to make countries “visible”. This allowed us to map the countries that fare considerably better or worse than expected. We found that a large part of the variation between countries could be explained by just three factors: population, availability of broadband internet, and the number of edits originating in that country.

Areas of Wikipedia hegemony and uneven geographic coverage. Oxford Internet Institute

While these three variables help to explain the sparse amount of content written about much of sub-Saharan Africa, most of the Middle East and North Africa have much less geographic information than might be expected. For example, despite high levels of wealth and connectivity, Qatar and the United Arab Emirates have far fewer articles than we might expect.

Constraints to creating content

These three factors matter independently, but they will also be subject to other constraints. A country’s population will probably affect the number of activities, places, and practices of interest (that is, the number of things one might want to write about). The size of the potential audience might also be influential, encouraging editors in more densely populated regions and those writing in major languages. And social attitudes towards information sharing will probably also change how some people contribute content.

We might also be seeing a principle of increasing informational poverty. Not only is a broad base of source material, such as books, maps, and images, needed to generate any Wikipedia article, but it is also likely that having content online will lead to the production of more content.

There are strict guidelines on how knowledge can be created and represented in Wikipedia, including the need to source key assertions. Editing incentives and constraints probably also encourage work around existing content – which is relatively straightforward to edit – rather than creating entirely new material. So it may be that the very policies and norms that govern the encyclopedia’s structure make it difficult to populate the white space with new content.

We need to recognise that none of the three conditions can ever be sufficient for generating geographic knowledge. As well as highlighting the presences and absences on Wikipedia, we also need to ask what factors encourage or limit production of that content.

Because of the constraints of the Wikipedia model, increasing representation on pages can’t occur in a linear manner. Instead it accelerates in a virtuous cycle, benefiting those with strong cultures of collecting and curating information in local languages. That is why, even after adjusting for their levels of connectivity, population and editors, Britain, Sweden, Japan and Germany are extensively referenced on Wikipedia, but the Middle East and North Africa haven’t kept pace.

If this continues, then those on the periphery might fail to reach a critical mass of editors, needed to make content. Worse still, they may even dismiss Wikipedia as a legitimate site for user-generated geographic content. This is a problem that will need to be addressed if Wikipedia is indeed to take steps towards its goal of being the “sum of all human knowledge”.