Geotagging reveals Wikipedia is not quite so equal after all

Some slices are bigger than others. jzawdubya, CC BY-SA

Wikipedia is often seen as a great equaliser. Every day, hundreds of thousands of people collaborate on a seemingly endless range of topics by writing, editing and discussing articles, and uploading images and video content. But it’s starting to look like global coverage on Wikipedia is far from equal. This now ubiquitous source of information offers everything you could want to know about the US and Europe but far less about any other parts of the world.

This structural openness of Wikipedia is one of its biggest strengths. Academic and activist Lawrence Lessig even describes the online encyclopedia as “a technology to equalise the opportunity that people have to access and participate in the construction of knowledge and culture, regardless of their geographic placing”.

But despite Wikipedia’s openness, there are fears that the platform is simply reproducing the most established worldviews. Knowledge created in the developed world appears to be growing at the expense of viewpoints coming from developing countries. Indeed, there are indications that global coverage in the encyclopedia is far from “equal”, with some parts of the world heavily represented on the platform, and others largely left out.

For a start, if you look at articles published about specific places such as monuments, buildings, festivals, battlefields, countries, or mountains, the imbalance is striking. Europe and North America account for a staggering 84% of these “geotagged” articles. Almost all of Africa is poorly represented in the encyclopedia, too. In fact, there are more Wikipedia articles written about Antarctica (14,959) than any country in Africa. And while there are just over 94,000 geotagged articles related to Japan, there are only 88,342 on the entire Middle East and North Africa region.

Total number of geotagged Wikipedia articles across 44 surveyed languages. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

When you think of the spread in terms of the way the world’s population is spread, the picture is equally startling. Even though 60% of the world’s population is concentrated in Asia, less than 10% of Wikipedia articles relate to the region. The same is true in reverse for Europe, which is home to around 10% of the world’s population but accounts for nearly 60% of geotagged Wikipedia articles.

Number of regional geotagged articles and population. Graham, M., S. Hale & M. Stephens. 2011. Geographies of the World's Knowledge. Convoco! Edition.

There is an imbalance in the languages used on Wikipedia too. Most articles written about European and East Asian countries are written in their dominant languages. Articles about the Czech Republic, for example, are mostly written in Czech. But for much of the Global South we see a dominance of articles written in English. English dominates across much of Africa and the Middle East and even parts of South and Central America.

Dominant language of Wikipedia articles (by country). Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

There more Wikipedia articles in English than Arabic about almost every Arabic speaking country in the Middle East. And there are more English articles about North Korea than there are Arabic articles about either Saudi Arabia, Libya, or the United Arab Emirates. In total, there are more than 928,000 geotagged articles written in English, but only 3.23% of them are about Africa and 1.67% are about the Middle East and North Africa.

Number of geotagged articles in the English Wikipedia by country. Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

All this matters because fundamentally different narratives can be, and are, created about places and topics in different languages.

Beyond English

Even on the Arabic Wikipedia, there are geographical imbalances. There are a relatively high number of articles about Algeria and Syria, as well as about the US, Italy, Spain, Russia and Greece but substantially fewer about a number of Arabic speaking countries, including Egypt, Morocco, and Saudi Arabia. Indeed, there are only 433 geotagged articles about Egypt on the Arabic Wikipedia, but 2,428 about Italy and 1,988 about Spain.

Total number of geotagged articles in the Arabic Wikipedia by country Graham, M., Hogan, B., Straumann, R. K., and Medhat, A. 2014. Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty. Annals of the Association of American Geographers (forthcoming).

By mapping the geography of Wikipedia articles in both global and regional languages, we can begin to examine the layers of representation that “augment” the world we live in. Some parts of the world, including the Middle East, are massively underrepresented – not just in major world languages, but their own. We like to think of Wikipedia as an opportunity for anyone, anywhere to contribute information about our world but that doesn’t seem to be happening in practice. Wikipedia might not just be reflecting the world, but also reproducing new, uneven, geographies of information.