UK United Kingdom

Thinking the unthinkable: tracing language back 15,000 years

Just about everyone has a personal stake in language, and many people — expert and amateur — feel entitled to an opinion. But linguists care more than most people, and when linguistics hit the media, linguists…

Linguistic controversy: could ultraconserved words point to deep language ancestry across Eurasia? Sharon Mollerus

Just about everyone has a personal stake in language, and many people — expert and amateur — feel entitled to an opinion. But linguists care more than most people, and when linguistics hit the media, linguists can get very agitated indeed.

Published earlier this month in the Proceedings of the National Academy of Sciences (PNAS), the latest paper to upset linguists around the world uses methods from computational evolutionary science to look at questions about language prehistory.

So why exactly are its conclusions so very challenging to traditional historical linguistics?

Language families

Standard historical linguistic methods let us reconstruct languages from the past based on so-called “cognates”.

Words from a pair of languages are cognate if they are similar in form and meaning, and can be shown to have descended from a common word present in the ancestor of those languages. Finding words that are similar in form and meaning is easy, but showing that these cognate candidates are true cognates is trickier.


The key here is that language change includes mutations to the sound systems of different languages which can affect all words in the lexicon.

True cognates will be part of a larger group of cognate sets which all show the effects of the same mutation in the sound system.

Historical linguists can build a chain of inferences about such sound changes extending back into prehistory.

Through this method, linguists can show with the highest degree of certainty that English five, French cinque, Russian pyat' and Armenian heng are all cognate, descendants of a single word in their common ancestor (a reconstructed language known as Proto-Indo-European).

Nina Matthews Photography

Likewise, linguists can show that the word dog in English and the word dog (meaning the same thing) in the Queensland language Mbabaram are chance resemblances produced by completely unrelated historical pathways.

Going deeper

While standard historical linguistic techniques are very powerful, they have natural limits. Beyond a certain time depth, so many sound changes have accumulated it’s no longer possible to identify cognates or prove that two languages belong to the same family.

But just because these methods no longer apply, must we assume it’s impossible to make any scientifically valid statements about language in the deeper past? The new PNAS paper, led by English evolutionary biologist Mark Pagel, sets out to challenge this assumption.

In previous work, members of Pagel’s team have shown statistically what many linguists intuitively believed all along: that more frequently used parts of the lexicon are more historically stable.


In quantifying the correlation between frequency of use and stability they were able to measure “lexical half-lives”, a measure of the stability of individual cognate sets within language families.

While most cognate sets stay around for a few hundred or a few thousand years, there is a hard core of terms that are stable over much longer periods.

The new paper starts from the point that if a cognate set is so stable that it is preserved for the 6,000 or 8,000 years of reconstructible history of a language family, chances are it was present in the ancestors of the language family for a very long time before that.

This, in turn, means that the languages of Eurasia could quite plausibly share terms which, deep down, are in fact cognate in the unreconstructible past.

Der Turmbau zu Babel (the Tower of Babel) by Meister der Weltenchronik. Wikimedia Commons

In a sophisticated (and, I admit, difficult to understand) analysis, the research team tested proposed cognates shared between the different language families of Eurasia.

The proposals themselves are necessarily controversial, since they are produced by a group committed to searching for long distance connections between languages. But the study shows that the larger proposed cognate sets and the ones showing more links between families are precisely the ones which they would predict are more stable.

They then use these cognate proposals to infer language history, weighting the proposals against their inferred half-lives of the words, show these languages can be grouped consistently into a super-family originating about 15,000 years before present.

A positive contribution

This is a new approach to deep time historical linguistic inference.

A lot more work is required to test both the validity of the methods, and of their specific results - but it is quite possible that the historical signal they detect is an artifact. The methodology appears sound, and has the potential to teach us a great deal.

It should not be dismissed out of hand just because it does not respect the limitations of traditional historical linguistics.

Sign in to Favourite

Join the conversation

7 Comments sorted by

  1. Massimo Bini

    Tertiary Education Consultant at Vision Australia

    If I understand this research correctly, and please realise that I am speaking as a philosopher and not a linguist, it has taken a methodology used to understand the spread of disease to understand the spread of languages. Treating cognate words the way DNA was. It has been common for linguists to assume the origin of Indo-European languages to the Russian Steppes based on the assumption that the horse warriors spread the original language around 6,000 years ago. This new study traces the origin of Indo-European languages to Anatolia (present day Turkey) around 15,000 years ago.

    The methodology and results are very interesting but would need to be followed closely to check for errors or bias.

    Scholars in linguistics should be happy for the controversy - expect a flood of papers.

    1. Michael Dunn

      Research Group Leader "Evolutionary Processes in Language and Culture" at Max Planck Institute for Psycholinguistics

      In reply to Massimo Bini

      You're referring to Bouckaert et al. (2012)[*], which used phylogeographic methods to model the chronology and geography of the Indo-European expansion. These methods can model evolutionary processes within a family that we already know exists. It's rather a different idea to the paper by Pagel's group, which is attempting to link together families which have not been proven to be related. This latter issue is not something that comes up in biology very often, since we know at least in broad terms where any pair of species sit with respect to each other on the tree of life. It comes as a surprise to some people, but there is no comparable great family tree of languages.


    1. Peter Hindrup


      In reply to Alice Kelly

      I don't know Alice, but off topic.

      Early 70's I used to go every lunchtime and spend an hour or so with a two and a bit year old girl. Walking was the idea, but to where and how far was entirely up to her.

      One day, having walked continuously including up a hill and some steps we were getting close to home when 40 of 50 metres ahead there was a young woman, with a companion, pushing a baby in a pushchair/stroller.

      With an excited shout of 'Bubba, Bubba, my charge took off ,up the street…

      Read more
    2. Megan Robins


      In reply to Peter Hindrup

      Hi Peter,
      I know this is also off topic, but my husband and I decided to teach out children sign language from birth to assist in the baby-parent communication process. We taught them the sign for Milk first (every time I breastfed them, I would ask them "do you want some milk (sign)?"). My son began to make the sign at 12 weeks of age and my daughter began at 8 weeks of age. What amazed us, was that at the same time they also used verbal language and said "Ilka". This continued every time they produced the sign. I realised later that the movement of the tongue when saying "Ilka" is the same movement they would make when drawing the nipple into their mouth when they were beginning to breastfeed. Our beginnings of language are certainly very interesting.

    3. Peter Hindrup


      In reply to Megan Robins

      Thank you Megan, at least one person beyond we three who witnessed don't think I am stark, raving mad!

      I have done some reading on cards with pics and words, and how quickly the very young pick it up. Fascinating!.

      You know we will probably be thrown off the site for this!

    4. Andrew Smith

      Education Consultant at Australian & International Education Centre

      In reply to Peter Hindrup

      Left to their own devices most children would create their own form of communication anyway and actual language is only about 30-40%.
      As a (not very good) speaker of a couple of other languages, and taught English as a Foreign Language, and Communication, there are many factors that make communication work (outside of actual language). This includes need or desire to learn, and dealing with multiple contexts whether social, cultural, religious, business, nationalist etc., and importantly the native speaker to be sympathetic to communication.

      Many post empire nation states have officially adopted language and religion leading their citizens to believe these features are unique or specific to their race or culture (and cannot really be learnt or adopted fully by outsiders and foreigners), i.e. their shared code.