Google Nest – The Conversation

‘Hey Siri’: Virtual assistants are listening to children and then using the data

2022-07-14T13:43:45Z

Virtual assistants are becoming a more common household fixture, and many children are growing up and interacting with them. (Shutterstock)

In many busy households around the world, it’s not uncommon for children to shout out directives to Apple’s Siri or Amazon’s Alexa. They may make a game out of asking the voice-activated personal assistant (VAPA) what time it is, or requesting a popular song. While this may seem like a mundane part of domestic life, there is much more going on.

The VAPAs are continuously listening, recording and processing acoustic happenings in a process that has been dubbed “eavesmining,” a portmanteau of eavesdropping and datamining. This raises significant concerns pertaining to issues of privacy and surveillance, as well as discrimination, as the sonic traces of peoples’ lives become datafied and scrutinized by algorithms.

These concerns intensify as we apply them to children. Their data is accumulated over lifetimes in ways that go well beyond what was ever collected on their parents with far-reaching consequences that we haven’t even begun to understand.

Always listening

The adoption of VAPAs is proceeding at a staggering pace as it extends to include mobile phones, smart speakers and the ever-increasing number products that are connected to the internet. These include children’s digital toys, home security systems that listen for break-ins and smart doorbells that can pickup sidewalk conversations.

There are pressing issues that derive from the collection, storage and analysis of sonic data as they pertain to parents, youth and children. Alarms have been raised in the past — in 2014, privacy advocates raised concerns on how much the Amazon Echo was listening to, what data was being collected and how the data would be used by Amazon’s recommendation engines.

And yet, despite these concerns, VAPAs and other eavesmining systems have spread exponentially. Recent market research predicts that by 2024, the number of voice-activated devices will explode to over 8.4 billion.

Even when virtual assistants aren’t actively on, they’re listening to and recording their environments. (Shutterstock)

Recording more than just speech

There is more being gathered than just uttered statements, as VAPAs and other eavesmining systems overhear personal features of voices that involuntarily reveal biometric and behavioural attributes such as age, gender, health, intoxication and personality.

Information about acoustic environments (like a noisy apartment) or particular sonic events (like breaking glass) can also be gleaned through “auditory scene analysis” to make judgments about what is happening in that environment.

Eavesmining systems already have a recent track record for collaborating with law enforcement agencies and being subpoenaed for data in criminal investigations. This raises concerns of other forms of surveillance creep and profiling of children and families.

For example, smart speaker data may be used to create profiles such as “noisy households,” “disciplinary parenting styles” or “troubled youth.” This could, in the future, be used by governments to profile those reliant on social assistance or families in crisis with potentially dire consequences.

There are also new eavesmining systems presented as a solution to keep children safe called “aggression detectors.” These technologies consist of microphone systems loaded with machine learning software, dubiously claiming that they can help anticipate incidents of violence by listening for signs of raising volume and emotions in voices, and for other sounds such as glass breaking.

Monitoring schools

Aggression detectors are advertised in school safety magazines and at law enforcement conventions. They have been deployed in public spaces, hospitals and high schools under the guise of being able to pre-empt and detect mass shootings and other cases of lethal violence.

But there are serious issues around the efficacy and reliability of these systems. One brand of detector repeatedly misinterpreted vocal cues of kids including coughing, screaming and cheering as indicators of aggression. This begs the question of who is being protected and who will be made less safe by its design.

Some children and youth will be disproportionately harmed by this form of securitized listening, and the interests of all families will not be uniformly protected or served. A recurrent critique of voice-activated technology is that it reproduces cultural and racial biases by enforcing vocal norms and misrecognizing culturally diverse forms of speech in relation to language, accent, dialect and slang.

Halcyon Lawrence, a technical communication scholar, expresses grave concerns about the potential deadly consequences that the use of voice technology alongside emotion recognition software will have on Black and brown people.

We can anticipate that the speech and voices of racialized children and youth will be disproportionately misinterpreted as aggressive sounding. This troubling prediction should come as no surprise as it follows the deeply entrenched colonial and white supremacist histories that consistently police a “sonic color line.”

Sound policy

Eavesmining is a rich site of information and surveillance as children and families’ sonic activities have become valuable sources of data to be collected, monitored, stored, analysed and sold without the subject’s knowledge to thousands of third parties. These companies are profit-driven, with few ethical obligations to children and their data.

With no legal requirement to erase this data, the data accumulates over children’s lifetimes, potentially lasting forever. It is unknown how long and how far-reaching these digital traces will follow children as they age, how widespread this data will be shared or how much this data will be cross-referenced with other data. These questions have serious implications on children’s lives both presently and as they age.

There are a myriad threats posed by eavesmining in terms of privacy, surveillance and discrimination. Individualized recommendations, such as informational privacy education and digital literacy training, will be ineffective in addressing these problems and place too great a responsibility on families to develop the necessary literacies to counter eavesmining in public and private spaces.

We need to consider the advancement of a collective framework that combats the unique risks and realities of eavesmining. Perhaps the development of a Fair Listening Practice Principles — an auditory spin on the “Fair Information Practice Principles” — would help evaluate the platforms and processes that impact the sonic lives of children and families.

Stephen J. Neville receives funding from SSHRC.

Natalie Coulter receives funding from SSHRC.

Shhhh, they’re listening – inside the coming voice-profiling revolution

2021-04-28T12:18:37Z

Companies could soon tailor what they try to sell you based on the mood conveyed by the sound of your voice. CSA-Printstock via Getty Images

You decide to call a store that sells some hiking boots you’re thinking of buying. As you dial in, the computer of an artificial intelligence company hired by the store is activated. It retrieves its analysis of the speaking style you used when you phoned other companies the software firm services. The computer has concluded you are “friendly and talkative.” Using predictive routing, it connects you to a customer service agent who company research has identified as being especially good at getting friendly and talkative customers to buy more expensive versions of the goods they’re considering.

This hypothetical situation may sound as if it’s from some distant future. But automated voice-guided marketing activities like this are happening all the time.

If you hear “This call is being recorded for training and quality control,” it isn’t just the customer service representative they’re monitoring.

It can be you, too.

When conducting research for my forthcoming book, “The Voice Catchers: How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet,” I went through over 1,000 trade magazine and news articles on the companies connected to various forms of voice profiling. I examined hundreds of pages of U.S. and EU laws applying to biometric surveillance. I analyzed dozens of patents. And because so much about this industry is evolving, I spoke to 43 people who are working to shape it.

It soon became clear to me that we’re in the early stages of a voice-profiling revolution that companies see as integral to the future of marketing.

Thanks to the public’s embrace of smart speakers, intelligent car displays and voice-responsive phones – along with the rise of voice intelligence in call centers – marketers say they are on the verge of being able to use AI-assisted vocal analysis technology to achieve unprecedented insights into shoppers’ identities and inclinations. In doing so, they believe they’ll be able to circumvent the errors and fraud associated with traditional targeted advertising.

Not only can people be profiled by their speech patterns, but they can also be assessed by the sound of their voices – which, according to some researchers, is unique and can reveal their feelings, personalities and even their physical characteristics.

Flaws in targeted advertising

Top marketing executives I interviewed said that they expect their customer interactions to include voice profiling within a decade or so.

Part of what attracts them to this new technology is a belief that the current digital system of creating unique customer profiles – and then targeting them with personalized messages, offers and ads – has major drawbacks.

A simmering worry among internet advertisers, one that burst into the open during the 2010s, is that customer data often isn’t up to date, profiles may be based on multiple users of a device, names can be confused and people lie.

Advertisers are also uneasy about ad blocking and click fraud, which happens when a site or app uses bots or low-paid workers to click on ads placed there so that the advertisers have to pay up.

These are all barriers to understanding individual shoppers.

Voice analysis, on the other hand, is seen as a solution that makes it nearly impossible for people to hide their feelings or evade their identities.

Building out the infrastructure

Most of the activity in voice profiling is happening in customer support centers, which are largely out of the public eye.

But there are also hundreds of millions of Amazon Echoes, Google Nests and other smart speakers out there. Smartphones also contain such technology.

All are listening and capturing people’s individual voices. They respond to your requests. But the assistants are also tied to advanced machine learning and deep neural network programs that analyze what you say and how you say it

Call centers can use AI-assisted voice technology to determine whether to upsell certain customers. Ralf Hiemisch via Getty Images

Amazon and Google – the leading purveyors of smart speakers outside China – appear to be doing little voice analysis on those devices beyond recognizing and responding to individual owners. Perhaps they fear that pushing the technology too far will, at this point, lead to bad publicity.

Nevertheless, the user agreements of Amazon and Google – as well as Pandora, Bank of America and other companies that people access routinely via phone apps – give them the right to use their digital assistants to understand you by the way you sound. Amazon’s most public application of voice profiling so far is its Halo wristband, which claims to know the emotions you’re conveying when you talk to relatives, friends and employers.

The company assures customers it doesn’t use Halo data for its own purposes. But it’s clearly a proof of concept – and a nod toward the future.

Patents point to the future

The patents from these tech companies offer a vision of what’s coming.

In one Amazon patent, a device with the Alexa assistant picks up a woman’s speech irregularities that imply a cold through using “an analysis of pitch, pulse, voicing, jittering, and/or harmonicity of a user’s voice, as determined from processing the voice data.” From that conclusion, Alexa asks if the woman wants a recipe for chicken soup. When she says no, it offers to sell her cough drops with one-hour delivery.

An Amazon patent depicts a device picking up a woman’s cough – and then asking if she wants a recipe for chicken soup. Google Patents

Another Amazon patent suggests an app to help a store salesperson decipher a shopper’s voice to plumb unconscious reactions to products. The contention is that how people sound allegedly does a better job indicating what people like than their words.

And one of Google’s proprietary inventions involves tracking family members in real time using special microphones placed throughout a home. Based on the pitch of voice signatures, Google circuitry infers gender and age information – for example, one adult male and one female child – and tags them as separate individuals.

The company’s patent asserts that over time the system’s “household policy manager” will be able to compare life patterns, such as when and how long family members eat meals, how long the children watch television, and when electronic game devices are working – and then have the system suggest better eating schedules for the kids, or offer to control their TV viewing and game playing.

Seductive surveillance

In the West, the road to this advertising future starts with firms encouraging users to give them permission to gather voice data. Firms gain customers’ permission by enticing them to buy inexpensive voice technologies.

When tech companies have further developed voice analysis software – and people have become increasingly reliant on voice devices – I expect the companies to begin widespread profiling and marketing based on voice data. Hewing to the letter if not the spirit of whatever privacy laws exist, the companies will, I expect, forge ahead into their new incarnations, even if most of their users joined before this new business model existed.

This classic bait and switch marked the rise of both Google and Facebook. Only when the numbers of people flocking to these sites became large enough to attract high-paying advertisers did their business models solidify around selling ads personalized to what Google and Facebook knew about their users.

By then, the sites had become such important parts of their users’ daily activities that people felt they couldn’t leave, despite their concerns about data collection and analysis that they didn’t understand and couldn’t control.

This strategy is already starting to play out as tens of millions of consumers buy Amazon Echoes at giveaway prices.

[Insight, in your inbox each day. You can get it with The Conversation’s email newsletter.]

The dark side of voice profiling

Here’s the catch: It’s not clear how accurate voice profiling is, especially when it comes to emotions.

It is true, according to Carnegie Mellon voice recognition scholar Rita Singh, that the activity of your vocal nerves is connected to your emotional state. However, Singh told me that she worries that with the easy availability of machine-learning packages, people with limited skills will be tempted to run shoddy analyses of people’s voices, leading to conclusions that are as dubious as the methods.

She also argues that inferences that link physiology to emotions and forms of stress may be culturally biased and prone to error. That concern hasn’t deterred marketers, who typically use voice profiling to draw conclusions about individuals’ emotions, attitudes and personalities.

While some of these advances promise to make life easier, it’s not difficult to see how voice technology can be abused and exploited. What if voice profiling tells a prospective employer that you’re a bad risk for a job that you covet or desperately need? What if it tells a bank that you’re a bad risk for a loan? What if a restaurant decides it won’t take your reservation because you sound low class, or too demanding?

Consider, too, the discrimination that can take place if voice profilers follow some scientists’ claims that it is possible to use an individual’s vocalizations to tell the person’s height, weight, race, gender and health.

People are already subjected to different offers and opportunities based on the personal information companies have collected. Voice profiling adds an especially insidious means of labeling. Today, some states such as Illinois and Texas require companies to ask for permission before conducting analysis of vocal, facial or other biometric features.

But other states expect people to be aware of the information that’s collected about them from the privacy policies or terms of service – which means they rarely will. And the federal government hasn’t enacted a sweeping marketing surveillance law.

With the looming widespread adoption of voice analysis technology, it’s important for government leaders to adopt policies and regulations that protect the personal information revealed by the sound of a person’s voice.

One proposal: While the use of voice authentication – or using a person’s voice to prove their identity – could be allowed under certain carefully regulated circumstances, all voice profiling should be prohibited in marketers’ interactions with individuals. This prohibition should also apply to political campaigns and to government activities without a warrant.

That seems like the best way to ensure that the coming era of voice profiling is constrained before it becomes too integrated into daily life and too pervasive to control.

Joseph Turow does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

Smart speakers: why sales are rocketing despite all our privacy fears

2020-09-08T15:08:29Z

Opportunity knocks. Eshop Citylink

With everyone spending so much time at home during the pandemic, smart speakers such as the Amazon Echo and Google Nest ranges have had a golden opportunity. In their latest attempt to make the devices as relevant as possible to this captive audience, various device makers recently announced that they are incorporating Zoom videoconferencing capabilities into speakers fitted with screens.

One seemingly big obstacle to all this is the fact that consumers have been increasingly worried about tech privacy in recent years. There have been many media stories about what smart speakers can capture and share. According to one major survey in 2019, half of all respondents said they don’t trust voice assistants with the safety and security of their personal data.

So what has that meant for sales of smart speakers? In response to all these cheap devices, how many of us are saying, “Alexa, leave us alone”?

Privacy issues …

A product category that didn’t exist until late in 2014 with the launch of the first Amazon Echo, these devices are essentially the interface for voice-driven virtual assistants like Amazon’s Alexa, Google’s Assistant, Apple’s Siri and Baidu’s DuerOS. Often you can get the same functionality from a built-in assistant in your smartphone or tablet, but smart speakers and screens give this capability on a standing device at home.

Privacy fears have been around since the early days. “Always listening” microphones concerned people, even though the device makers promised they would only listen if someone spoke directly to the device via its “wake word” or phrase.

Yet we know that smart speakers can always be listening, and can wake up and record more than people realise. The supposed protection from the “wake word” can easily be changed by an errant software update – for instance, a recent mistake in pushing out a software update to too many users led to Google smart speakers waking based on other sounds in the home (the update has since been recalled).

Recordings from devices have been streamed out of the home to the manufacturer, where staff have reportedly listened to them to try to make the voice recognition better. In one reported case, 1,000 of these recordings were leaked.

Pierre Lecourt, CC BY-SA

In a 2018 American survey, 38% of people said they didn’t want something “listening in” on their life all the time, and 28% were concerned about privacy issues. In the 2019 survey I mentioned earlier, which included consumers in the US, UK, France and Germany, 52% were worried about “passive” always-listening smart speakers and voice assistants, and the risk of them listening to private conversations. A separate study also reached the same conclusion: 52% of UK customers have concerns about privacy.

It’s not just customers who are concerned about privacy – some businesses have spoken up. The international law firm Mishcon de Reya asked all its staff working from home to mute or turn off smart speakers, and any other visual or voice-enabled devices, when talking about client matters at home.

… and record sales

In spite of all these concerns, smart speakers were one of the most popular Christmas gifts of 2019, capping a record year for sales. Worldwide, 147 million devices were shifted across the year, a 70% increase on 2018. A fair proportion was in China as Baidu, Alibaba and Xiaomi all made headway with their own smart devices: they are now the third, fourth and fifth biggest vendors after Amazon (first with 22%) and Google (second with 17%).

During the pandemic, sales have held up very well. China had a weaker first quarter and the US and Europe had a weaker second quarter, but world smart-speaker sales for the year are expected to be about 161 million – 10% up year on year, despite the trough in the global economy.

Baidu: the Chinese #1. abolukbas

With the US and UK at the forefront in terms of penetration, around one in three people now has access to a device in those markets. In China, it is about one in ten of those with internet access.

So how to reconcile these stats with the privacy fears? One clue is that the most common reason why people in the UK reported using a smart speaker was because they received it as a gift – more than half did so. The next most common (and growing) reason was music streaming.

We also know from research that people find it less convenient to press a button to wake a speaker than to activate it by voice. So it may be that people are caught between wanting privacy on the one hand but wanting convenience at the same time.

Smart speakers make us more loyal to the tech giants that control them. For example, Spotify has a market share of around 49% in the US, but only accounted for 20% of audio streaming time on smart speakers; Amazon Music, on the other hand, makes up 33%. Presumably it benefits from being the default option on Echo devices.

Some other smart speakers give even less control of this – Apple’s HomePod speaker has only recently made available an option to choose the default music service for audio, in a pre-release version of the software. Seemingly beaten by these gatekeepers, Spotify has done a deal with Google to give away Google Nest Mini devices to its premium subscribers to try and bounce back.

Whether or not this succeeds, it demonstrates that the key opportunity afforded by smart speakers – and the growing range of other smart home technologies – is to provide device makers with an avenue to make it easier for customers to use their other products and services. Having apparently weathered people’s privacy concerns, the companies that rule this frontier look poised to satisfy an ever greater share of consumer needs in future.

Greig Paul does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.