UK United Kingdom

Does Skype help or hinder communication?

Speech is the one of the most important forms of communication between humans. The internet has opened doors for us to communicate with people across the globe – but the technology often leads to misunderstanding…

Is the digital world making communication easier or harder? Envato

Speech is the one of the most important forms of communication between humans.

The internet has opened doors for us to communicate with people across the globe – but the technology often leads to misunderstanding.

As pointed out by Naomi Harte of Trinity College in a recent presentation delivered to the Science Gallery Dublin, communicating effectively through technology is a lot more complex than you may imagine.

The quality of the sound we hear, the images we see and the emotions that speech convey, matter greatly for the creation of valid meaning. This complexity is often not captured when communicating through the internet and with machines.

Lost in translation

Whether it be a Skype call, or giving a voice command to your smartphone, communicating through technology can be tricky.

Anything that disrupts our ordinary speech rhythms, as well as the way we process tone of voice, facial expression and other physiological cues, can affect interpretation of the speech act and transform meaning.

Understanding the complex ways in which we communicate can help us develop technologies which will improve online exchanges and reduce misunderstandings, so engineers and researchers such as Harte have been focused on two ways to improve digital communication:

  1. improving digital speech quality
  2. transmitting emotion in human-computer interactions and internet telephony.

We’ve come a little further than that photophone transmitter, but machine communication can still be rough. Wikimedia commons

Improving speech quality

In many situations where humans and gadgets try to talk to each other – through dialogue systems, e-books, tablets, mobile phones and computer games – it is the machine which struggles to understand the spoken cues and then formulate an intelligible and natural sounding response.

Researchers are trying to improve human-to-human communication by enhancing how the multimedia capabilities of the internet function together simultaneously. At Trinity College Harte is currently working on a project to improve something called “Audio-Visual Speech Recognition”.

This means using visual data such as tracking lip movements to improve speech recognition and thus the audio signal. Using similar mechanisms the research group Sigmedia hope to improve human-to-machine communication by having machines sense lip and eye movements, gesture and voice.

Transmitting emotions

The other major challenge to improving speech technologies is related to the emotional content of speech.

Designers and researchers know this – but the field of affective computing (computing that influences emotion) is relatively young. At this point speech recognition systems are mostly inadequate to the task of conveying and recognising emotion.

Emotion can be tricky to convey online. edenemotions/Flickr

This makes these applications both less user friendly and less effective.

For these systems to improve there must be research into how to create a framework for the classification of emotional signals, in particular given that they vary greatly across cultures. Fusing audio and visual cues and accounting for cultural and situational variation is key to this process.

Researchers must ask how well their system functions when speech is informal, people are speaking in a second languages, or when the speakers are emotionally influenced.

The importance of non-verbal cues

Understanding a spoken message also depends on what we see at the time.

To illustrate this point, Harte combines an identical sound with two videos of lips mouthing different phrases.

Although the sound remains the same, the audience believes they have heard two different words – even after she explains the trick (check out the video below to try it for yourself).

Essentially, the same sound will be heard differently depending on the visual signal. This phenomenon is called the “McGurk effect”, and shows that speech is seen as well as heard.

This example points to a fact that anthropologists, psychologists, mothers and salespeople know well: non-verbal cues like tone of voice and gesture texture our understanding of any speech-act.

This has to be taken into account when communicating digitally.

Even speech itself is hard to understand without context. In addition, the interpretation of the context changes according to our cultural background. The tempo and rhythm of our speech, how long we pause and how long we wait after someone has spoken before initiating a response differ across cultures.

In some languages, the time we would wait before we respond is much longer than others, where it is customary to overlap our response with the end of another person’s statement. This effects communication to such a great extent that a native English speaker choosing to speak in Spanish will mostly abide by the customary patterns of English and vice versa.

A time lag during a Skype voice call can thus intensify misunderstanding and dissonance in inter-cultural communications. And this may be exaggerated by the quality of the audio or video signal.

Does digital communication have a place in business?


There is a general belief that investment in communications technology can cut the cost of international business and collaboration.

Thus Google hangouts, Skype and Facebook video are increasingly used for professional purposes such as conferences, international meetings, student lessons and supervision.

There are many documented examples of success, failures and misunderstandings.

Successful international business will probably continue to rely on handshakes, given the importance of physical presence in conveying emotion, creating trust and building empathy.

However, businesses also run on efficiency.

If technology can improve to the extent that it enables the processing of non-verbal gestures (such as lip and eye movements), then the reduced costs in terms of travel will continue to make it a lucrative area of academic research, technological investment and business practice.

One thing is certain, improvements will depend increasingly on the synthesis of multimedia capabilities and recognition of our cultural differences in communicating, interpreting and understanding one another.

Join the conversation

10 Comments sorted by

  1. Stephen Ralph

    carer at n/a

    We talk, people either listen or they don't.

    Humans have manged to communicate for better or worse over the ages.
    I hate talking on the phone, others can chat for hours.

    One person may say to another "gee you look awful" and get a passive response. Say that to another person and it may be a swift and angry response.

    Perhaps we should all think before we speak on many occasions, but that's a big ask.

    1. Philippa Barr

      logged in via Facebook

      In reply to Stephen Ralph

      Well I am not sure the technology can do much about that, indeed there is a lot you can say about the way technology has permitted us to respond to each other in haste and without having to deal with each other's presence or body language.

    2. John Kerr

      IT Education

      In reply to Stephen Ralph

      I just remembered a situation a few years ago - before Skype - when we needed an extensive phone system installed between campuses. We had a two way video conference with two experts in the city. There were three of us and two of them. It was a bit more expansive than Skype as we could see all of them and they could see all of us. The discussion lasted about 40 minutes. When I finally met the experts in real life it was as though I already knew them because of the visual contact. Compare this to a lady I spoke to quite often on the phone in the education bureacracy. When I finally met her she was nothing like I imagined due to no visual contact. I am sure visual contact and body language are essential factors for 100% understanding in human contact.

  2. John Kerr

    IT Education

    The problems of technology and communication have been with us for some time. I would have thought that Skype was an improvement over the other modern forms like texting as at least you can see the face and part of the body. As people have turned away from direct face-to-face communication due initially to TV watching instead of playing games, and then to computers, I have noticed that reading body language skills have decreased to a large degree. As a teacher I started to notice that kids were not as adept at reading body language and this has become worse. Often there would be a situation and I could see that trouble was looming because someone could not read that the other person was becoming angry. I think that some of the alcohol fuelled violence also seems to be caused by the lack of ability to read body language. Things like emoticons in texting are a step in the right direction but not really a sufficient replacement to indicate emotion or meaning.

    1. Philippa Barr

      logged in via Facebook

      In reply to John Kerr

      Absolutely agree that Skype is an improvement but it is best to be aware of its pitfalls before depending on it for international group meetings, conferences - all business communications. Other forms are definitely better for building and initiating rapport at this stage.

    2. Philippa Barr

      logged in via Facebook

      In reply to John Kerr

      This is such an interesting observation, thanks for sharing. Some philosophers and theorists have claimed that the art of body language was perfected in the French court of the 18th century, where people had a lot of time to watch each other and many hierarchical differences to reinforce and guard. I think its very interesting what you say about a lack of skill in reading body language as potentially exacerbating conflict - could be very interesting to look at in a cross cultural setting where body languages mean different things.

  3. Michael Mihajlovic


    I have been talking on skype to my son, who lives and works in London, for almost a year now and have never noticed any oddities.

    1. Philippa Barr

      logged in via Facebook

      In reply to Michael Mihajlovic

      Great! Presumably you have a good internet connection. The other thing to note is that you must already know and understand the communication style of your son, so would be able to understand a great deal more just from his tone of voice about meaning than, for example, a potential business partner you have never met in person before.

  4. Allan Gardiner


    Philippa, to exactly what 'effect' does: "This effects communication to such a great extent..."? In "eff_ect'opic", 'tis more likely to but largely 'affect' but those prone to miscommunicate their ide_as'siduously. ;^)

    1. Philippa Barr

      logged in via Facebook

      In reply to Allan Gardiner

      Dear Allan, yes I agree that affect rather than effect is the most appropriate verb - could be a problem of the editing.