The linguistic clues that reveal your true Twitter identity

Twitter is awash with trolls, spammers and misanthropes, all keen to ruin your day with a mean-spirited message or even a threat that can cause you genuine fear. It seems all too easy to set up an account and cause trouble anonymously, but an emerging field of research is making it easier to track perpetrators by looking at the way they use language when they chat.

The first Twitter criminal?

The #TwitterJokeTrial was an early, if unfortunate, example of an apparent Twitter crime. Paul Chambers, frustrated at being prevented from visiting his girlfriend when snow disrupted transport, tweeted:

“Crap! Robin Hood airport is closed. You’ve got a week and a bit to get your shit together otherwise I’m blowing the airport sky high!!”

Chambers was at first prosecuted for sending a message of “menacing character”, but he later raised a successful appeal against his conviction. The message was nevertheless clear: be careful what you write, either be nice or be anonymous.

Anonymous virtue?

We’ve learned from these incidents that if you want to say something controversial or aggressive on Twitter, you’d probably better do it from an account not tied to your real name.

The perceived anonymity of Twitter trolls seemed to facilitate the trolling attacks experienced by Caroline Criado-Perez and Stella Creasy MP this summer. Criado-Perez and Creasy had been running a campaign to have a woman represented on UK bank notes, and so became subject to a vitriolic misogynist attack, all via the medium of Twitter. In this case, policing has led to arrests, despite the fact that trolls opened multiple accounts to hide their identities when conducting their attacks.

There are however many examples of others who have managed to remain anonymous, escaping prosecution for abusive threatening tweets. Technological anonymity is all too easy to achieve. On the 22nd of October Laurie Penny reported on Twitter that she had “just been informed UK police cannot track down those who sent bomb threats to female journalists this summer, because of Tor.”

The Tor Project is a free online network which facilitates anonymity by creating a complex relay of the message through potentially thousands of servers. It thus makes any attempt to identify the origin of a message near impossible.

Linguistic clues

Luckily, such attempts at anonymity are not always successful. All the technology in the world can’t stop you from leaving a trail behind you when you broadcast your thoughts online or via text message. We all have individual writing styles and habits that build to create a linguistic identity.

Forensic linguistic experts can penetrate technological anonymity by interrogating the linguistic clues that you leave as you write. Everything from the way someone uses capitalisation or personal pronouns, to the words someone typically omits or includes, to a breakdown of average word or sentence length, can help identify the writer of even a short text like a Tweet or text message.

Forensic linguistics is a growing field, partially because the increasing importance of online communications in our daily lives demands it.

The technique was used, for example, in a 2009 murder trial in Stoke on Trent, to build a case against a man who had murdered his wife and attempted to cover his tracks by sending text messages from her phone. He sent messages to himself and others to make it look like his wife was still alive on the day he had killed her but the way they were written gave him away.

Forensic linguists have also contributed expertise in cases of rape, blackmail, mistaken identity, extortion, and the multiple identities of online paedophiles.

Moral grey area

In criminal investigations forensic linguists are seen to be on the side of justice, but the field clearly contains a moral peril. Just as we can develop techniques that target online paedophiles or use methods to discover those who attacked Criado-Perez and Creasy, so these same techniques can be used against those whom we might ethically want to protect.

For instance, forensic linguistic techniques could identify an anonymous blogger campaigning against an oppressive regime, or an environmental activist who is being inconvenient to government plans. Individual forensic linguists might take a principled stand in one case or another but others might not agree with their moral choices.

This is of particular concern as these techniques become automated. Authorship analysis technologies can of course be sold to anyone who can afford them and can be used for whatever purpose they like.

In the future, we may even see a technological arms race between those attacking and those defending anonymity.

New ways to hide

If online anonymity can be compromised by textual analysis it may seem that your only option is to play nice online. But now there is an alternative and it’s increasingly popular.

Facebook numbers amongst teenagers are falling and Twitter numbers may follow. The kids today are turning to SnapChat and other similar services, which allow content to be shared but not stored. Apparently aware of the appeal, Facebook reportedly tried to buy SnapChat for US$3 billion recently.

However, these ephemeral messaging services won’t only be attractive to teenagers. Networks of online criminals are already using them. It’s a return to the old days: like a whispered conversation in a dark corner of a pub, eavesdropping is more difficult and the message expires without a trace. Or so SnapChat claims. Either way, as with the creation of Twitter, this new development will no doubt create new criminal, legal and ethical challenges for forensic linguists and the wider public to grapple with.

The linguistic clues that reveal your true Twitter identity

Author

Disclosure statement

Partners

The first Twitter criminal?

Anonymous virtue?

Linguistic clues

Moral grey area

New ways to hide

Want to write?