Roughly 7,000 languages are used around the world, and many thousands more have cycled in and out of existence throughout human history. Where did these languages come from, and how did our ancestors create the very first ones? One basic unanswered question is whether the first languages began as gestures, like modern-day signed languages of the deaf, or as vocalizations, like most extant human languages, which are spoken.
Unfortunately for scientists interested in these questions, languages don’t leave fossils. So instead, experimental psychologists like me try to understand how language evolved by conducting communication studies with modern human beings.
Recently, my colleagues and I ran a series of experiments to examine how effectively people are able to communicate vocally without the use of speech. Can they use vocalizations to express their thoughts, without using words – and what can their efforts tell us about how the very first languages may have arisen?
‘Iconic’ clues from signed languages’ recent roots
Estimates of when the first spoken languages arose are highly uncertain, spanning tens of thousands to hundreds of thousands of years ago or more. They are far too ancient for us to detect any evidence of an original “proto” language in what people speak today.
However, signed languages may offer a clue. These gestural languages created by the deaf typically have much more recent roots, being on the order of just tens or hundreds of years old.
In a handful of cases – for instance, when deaf children without a native signed language have come together in schools for the deaf, or in isolated rural communities with a high incidence of genetic deafness – scientists have actually had the opportunity to observe how signed languages are created anew.
What they find is that people in these circumstances first invent “iconic” gestures – that is, gestures that somehow depict or enact their meaning. For instance, think of scribbling your signature in the air to ask the server for the bill at a restaurant, or pointing and tracing a route to give someone directions. These gestures show what you are trying to express.
Iconic gestures, which can be understood even when communicators lack a common language, can then be molded into a system of signs and grammatical rules that are shared between members of a community. Over time and generations, they can develop into a fully complex and expressive language.
Can voices make the same leap?
But can this same process work with the vocalizations of speech? Can people similarly use their voice to depict their meaning and bootstrap the creation of a spoken language without gestures?
On the face of it, many scholars have argued “no.” They reason that it is much easier to show a concept with a visible gesture than to represent it with some kind of noise. This intuition is illustrated by an example from psychologist Michael Tomasello – trying to request Parmesan in an Italian restaurant by twiddling your fingers over your pasta as if sprinkling grated cheese. But what kind of vocalization would you produce to express this?
About this challenge, the renowned linguist Charles Hockett once wrote that:
When a representation of some four-dimensional hunk of life has to be compressed into the single dimension of speech, most iconicity is necessarily squeezed out. In one-dimensional projection, an elephant is indistinguishable from a woodshed.
Was Hockett right about the limited potential for people to create iconic vocalizations? To what extent can people create vocalizations with acoustic properties that somehow resemble their meaning in the same way they are able to create iconic gestures that do?
Creating new ‘words’ in the lab
Of course, our research participants come to the lab already knowing a spoken language – this is unavoidable. Yet, we have found that just by asking people to vocalize without speaking, we are able to learn a lot about their ability to communicate with iconic vocalizations, and also about their ability to use these vocalizations to create simple systems of vocal “words.”
For example, in our most recent study, published in the journal Royal Society Open Science, we asked university students to communicate with each other in a 10-round game of vocal charades. Their task was to communicate a set of various meanings – such as smooth, slow, big, up or down – to their partner with vocalizations, without using words.
We found that participants shared similar ideas of how certain properties of their voice – such as pitch, loudness, timbre and duration – translated to particular meanings. With few exceptions, each meaning was expressed with characteristic properties that distinguished it from each other meaning.
For example, vocalizations meant to convey “rough” were aperiodic and noisy.
“Fast” was conveyed with high-pitched and loud sounds.
And “small” with high-pitched and soft sounds.
The fact that people consistently made vocalizations with particular acoustic properties for each particular meaning suggests that the vocalizations were iconic, somehow depicting or resembling their meaning. (We were also able to show that the vocalizations did not resemble the acoustic properties of the actual spoken words to which they referred; participants truly were generating vocalizations that were independent from their knowledge of English words.)
So participants were able to create iconic vocalizations that in some way embodied their meanings for a range of concepts.
Putting it all together
Were participants able to take the next step and mold these vocalizations into more language-like symbols? To answer this question, we examined what happened to vocalizations and partners’ ability to understand them over the course of the game.
Over the 10 rounds, the vocalizations participants produced became more and more word-like. What began as highly variable, improvised vocalizations became shorter and more stable in form as participants repeated the interaction across rounds. At the same time, their vocalizations became more readily understandable, with partners guessing their meaning faster and with greater accuracy. Thus, it appeared that participants were using iconic vocalizations to establish an initial understanding between each other, and then with repetition, they were turning these vocalizations into more efficient symbols – not unlike words.
We then asked whether third-party listeners who had not participated in the charades game would be able to guess the meanings of the vocalizations. If so, it would bolster the argument that they were iconic and understandable without prior convention.
To test this, we played the vocalizations produced by our charades participants to listeners recruited through Amazon Mechanical Turk – a web service where workers can perform online tasks for payment. We paid participants to listen to the vocalizations and guess their meanings in a multiple-choice format. These naïve listeners were able to understand the vocalizations with a level of accuracy that was much higher than chance – on average, about 36% correct compared to the expected 10% by chance – further indicating that they were iconic in some way.
A glimpse of how language could have evolved
But what do these findings say about the bigger question of how the first languages originated? Certainly great caution is warranted in generalizing to the evolution of language from experiments conducted in the laboratory with English-speaking undergraduates or online with Mechanical Turk workers.
But our experiments do show that the human potential to create iconic vocalizations is quite impressive, far exceeding many previous estimates that have influenced scientific theories of language evolution. We also demonstrate an important proof of principle that people can use iconic vocalizations as source material to develop conventional symbols – comparable to how people might create conventional signs.
Importantly, our claim is not that spoken languages must then have evolved exclusively from vocalizations. Rather, our argument is that there is considerable potential for vocalizations to support the evolution of a spoken symbol system. Of course when people are free to communicate “in the wild,” they draw spontaneously on both vocalizations and gestures of all kinds. Therefore, when facing a naturally occurring challenge to devise a communication system, people are likely to take advantage of the strengths of iconic representation in each modality.
Yet even if language has multimodal origins, our study hints at the intriguing possibility that many of the spoken words of modern languages may have long ago been uttered by our ancestors as iconic vocalizations.