A recent study of Twitter communication patterns has revealed that human activity on Twitter is easily distinguishable from other types of users. By analysing the timing of tweets, we were able to discover whether the user behind an account was an individual human, a team or a machine, all without having to look into the content of messages.
Twitter is one of the world’s most popular social networks, with more than 200 million active users. Communication on Twitter is an integral part of many people’s lives. It’s also a major tool for marketing, political campaigns and the planning of social movements like the Arab Spring and recent protests in Brazil.
But one of the downsides of this platform is that it can be hard to know who we’re talking to. On Twitter in particular, many celebrities and politicians don’t write their own tweets, instead using public relations teams to manage their accounts. This has led to a huge number of sites dedicated to tracking down “real” celebrities on Twitter.
There are also many robot accounts on the network, which means we can’t even be sure that we’re exchanging messages with an real person. Robot accounts can be used to spam other users with links to malicious websites. Markets of fake robot followers are also used to increase the apparent popularity of accounts. The study developed techniques to help us identify the kind of user - or users - behind an account, which could aid spam filtering and prevent interaction with unwanted users.
Timing is everything
Using a data set of more than 160,000 tweets, we found that there are common tweeting patterns among three types of user accounts. These are personal accounts, robot-controlled accounts and the managed accounts of corporations, politicians or celebrities, which are typically administered by multiple people. We manually grouped our sample users into these three categories, in order to compare their behaviour.
Research in neuroscience and social sciences consistently shows that our behaviour is a lot more predictable than we think. And the way we make decisions on social media is no exception. Behavioural patterns emerged when we analysed two timing factors: the time when a tweet was posted and the time interval between every two tweets from the same user.
By using timing, rather than profile or message content, as the key variable of our research, we eliminated errors caused by users providing false information on their profile. It also meant we could avoid the difficulties associated with analysing messages, which can be highly variable.
Results showed that personal accounts tend to tweet similarly throughout the week during waking hours, from 7am to midnight, with activity peaking in the evenings. In contrast, managed accounts usually tweet from Monday to Friday and during working hours, between 8am and 8pm.
We also discovered that accounts controlled by robots show clear artificial behaviour, consistently tweeting at specific times or within well defined intervals. The three types of accounts each presented a different interval pattern between their tweets.
Humans after all
To reveal common behaviour among users of the same category, we used “probability distributions”. A probability distribution is simply a function that describes how likely a user is to tweet at a certain time, based on all the data available for users of that type.
We developed a machine learning algorithm which uses the differences in tweet timing between the three account types to classify unknown users. This algorithm, known as a naïve Bayes classifier, learns how each type of user is supposed to behave through the probability distributions of tweeting activity.
The classifier is trained with data from a set of users whose account types we know in advance. It learns how to tell whether a new user is a human, a group of humans or a robot by comparing its behaviour to the pattern of each account type. The classification is made based on the similarities in its behaviour. Our algorithm assigned 75% of users to their correct categories.
We were also able to predict the future behaviour of users, to a certain extent. A prediction algorithm could tell us the expected waiting time between tweets for human users. This was possible because so many human users display common tweeting patterns, so the activity of a group of human users can give us a good idea of how a single human will use Twitter. Robots were much harder to predict. Since they are all programmed differently, they don’t display a common pattern as a group of tweeters.
As we spend more and more of our lives on social media, it’s handy to know who we are interacting with. Learning to spot a robot may be the first step.