Researchers recently managed to recreate the voice of 5,300-year-old Ötzi the iceman by recreating his vocal tract. The technology is promising and could be used to digitally produce the voices of other mummified remains. But how does it work and what else could it be used for?
When you make a vowel sound (aah, ee, oh, ooh and so on), three parts of your anatomy play important roles: your lungs, your larynx and the tube made from your throat and mouth. Your lungs provide the airflow that powers the sound. If the flow becomes too weak it will turn into a whisper instead.
Your larynx, or voice box, sits about midway between your lungs and your lips, just behind your Adam’s apple. The part you can feel from the outside is the cartilage protecting and supporting the vocal folds (or vocal cords) inside. These are a pair of soft, lip-like structures that run from your Adam’s apple to the back of your windpipe.
You can bring these folds firmly together across your windpipe to close it off completely – you do this when you cough or choke. You can also bring them across so they just touch, and if you do that and then breathe out they vibrate in much the same way your lips do if you blow a raspberry. These vibrating vocal folds are the source of sound for a vowel. If you say aah while you press your fingers gently either side of your Adam’s apple you can feel the vibrations in your larynx.
Everyone’s voice has a natural pitch based on the size of their larynx and in particular the length and thickness of their vocal folds. Your natural pitch is what comes out when your throat muscles are fairly relaxed and you don’t try to speak too loudly. Women have shorter, thinner vocal folds than men and so they have generally a higher natural pitch.
If your windpipe ended just above the larynx then you would just be able to produce buzzing sounds. The lowest frequency in the buzzing sound is part of your natural pitch, but there is also energy at many higher frequencies included in that sound. It’s the airway that shapes the buzz sound into a particular vowel.
We can think of this airway as a tube. You can change the length of that tube by protruding your lips, as you do when you say ooh, or by moving your tongue. When you say aah, your tongue rolls back out of your mouth and into your throat so the lower half of the tube is narrow and the upper half is wide, for example.
Every tube has a series of resonance frequencies that relates to its length and its cross-sectional area. These are the frequencies of sound that pass along the tube most easily and with least energy loss, so if we have a buzz sound generated at the larynx end of the tube, the sound at the lips’ end will be the original buzz, but with the resonance frequencies of the tube sounding much louder than any other frequencies in the buzz.
When you listen to a vowel sound it’s these resonance frequencies you are using to decide which vowel you are hearing. Changing the position of your tongue and lips changes the length and cross-section of the tube, which changes the resonances and ultimately the vowel you hear.
Ötzi and his peers
To know how Ötzi the Iceman sounded we need to know how long and how thick his vocal folds were – that tells us about the natural pitch of his voice. We also need to know how long his airway was and about the cross-sectional area to work out the resonance frequencies. His tongue and lips will have been preserved in one particular position which will only give us information about a single vowel sound. So if we are to work out how he sounded for other vowels we also need to know a bit about the size of his tongue and where it joined to his windpipe. Knowing this allows us to work out the other possible tube-shapes he could make and calculate their related resonances.
But how can you actually work all this out? It’s pretty simple, all you really need is a CT scan, which uses X-rays to create detailed images of the inside of the body. This allows us to measure all these anatomical dimensions. We can then use that information to make a computer model to synthesise what his voice might have sounded like.
The first use of X-rays to explore mummified remains is thought to have been by Walter Konig in 1896, very soon after X-rays were first discovered. CT scans have been conducted on mummies for more than 40 years, with the popularity of the technique increasing rapidly over the last decade or so. However, the study of Ötzi the Iceman seems to be the first time the CT data has been used to synthesise a voice.
In a study of 137 mummies published in the Lancet in 2013, CT scans were used to show that, contrary to much current thinking, disease of the arteries was common in many pre-industrial populations. For speech, the CT scanning technique could similarly provide us with valuable information about the dimensions of the vocal system for any mummified body. And with enough different sets of scans we might be able to track trends in voice over time, such as changes in the typical natural frequency due to nutrition and body size.
One of the big open questions about speech is exactly when the ability to communicate in this way evolved, and there is quite a controversy about whether Neanderthals, for example, could speak. Sadly the CT scanning techniques can’t help us with this as they rely on the preservation of soft tissue. The earliest hominid remains are fossilised which means only the bone structure has survived. The absence of lung, larynx, airway or tongue information in these fossils makes our ability to predict their capacity for speech very much less certain. At about 5,300-years-old Ötzi is the earliest European mummy in existence, but deliberately mummified bodies as old as 7,000 years have been found in South America. Spirit Cave Man, found in North America in 1940, has been dated at 9,000-years-old, so if CT scans were made, even older voices than Ötzi’s could perhaps be heard one day.