Clinton-Sanders data breach spat goes to the heart of modern campaigning

“You’re crushing my hand, Hillary.” “I know.” Reuters/Brian Snyder

At their most recent debate, the two main US Democratic party candidates Bernie Sanders and Hillary Clinton tried to defuse a heated dispute between their campaigns over a breach of the Clinton campaign’s voter data, which Sanders staffers allegedly accessed inappropriately. As punishment, his campaign was briefly suspended from the Democratic Party’s master voter file, and it’s now suing the party for $600,000 per day of lost access.

While the two candidates exchanged conciliatory statements in public, their staff remain at loggerheads – one Clinton operative likened it to “the opposing general getting your battle plans.”

The campaigns' outrage at the incident shows just how dependent on data US election campaigns have become. Now-President Barack Obama’s 2008 campaign famously took the use of voter data to new heights, and his 2012 re-election effort was similarly supported by a whole team of data experts – whereas Mitt Romney’s campaign to defeat him was hobbled when his data operation utterly collapsed on election day.

The past two elections have given the Democrats a serious edge in the data game, and this cycle’s Republican contenders are making every effort to keep up. Ted Cruz has already got into hot water for using tens of millions of Facebook users' data to construct psychological profiles. Even Donald Trump, as maverick as any candidate in living memory and equipped with only a bare-bones campaign machine, knows better than to do without it.

The rationale for all this is plain enough: the more information you collect and process about voters, the better you can target your message and calibrate your speeches to win them over. Data now determines almost all choices of how to run advertising and where to look for potential donors.

Having the right data allows campaigns to build mathematical models of the electorate to a very fine level of detail, and to quickly react to shifts in electorate opinions which can also be “mined” from Twitter, Facebook and the like.

One day, honey, I’ll have all the data I need on you. Reuters

Now they can establish to a remarkable degree how likely a particular voter is to choose a particular candidate – and that allows campaigns to target their attention and messaging incredibly finely. It is now possible to automatically place voters in a pre-defined set of classes (“college student not owning a gun who is pro-immigration and pro-gay marriage”, for example) and target such classes using different strategies.

Far from being purely cynical and mechanical, this sort of data-driven approach does actually offer benefits to voters. It’s one way to push back on the echo-chamber effect that obstructs campaign messaging, in which people keep seeing the same type of message over and over again. Being able to target undecided voters allows them to look at a diversity of content, solving the problem of being in a “filter bubble” that keeps them from seeing anything unexpected that might change their thinking.

Lie of the land

Using data in this way is not at all simple. Big data necessarily contains a lot of “noise”; the real expense isn’t in collecting large amounts of data, but rather in extracting valuable information from it.

However much data is available, it’ll always be incomplete. It will be a sample and will therefore have a bias, meaning it can never represent the entire voter population with perfect accuracy. People that are on Twitter or those that are accessible via services such as Amazon MTurk, a popular crowdsourcing platform used to collect data and to run surveys, are not representative of either the general population or of the electorate.

Any effort to collect data from Facebook and run surveys online also has to contend with serious data quality problems. Our recent research has shown that about half of the people on such online platforms provide low quality or fake data.

Another drawback of data gathering is its cost. Asking people to provide information on Amazon MTurk costs money. If the number of people is large, the cost may become outrageous.

But however challenging it is to handle, data is only going to become more important for campaigners. And if it’s used correctly, it might help improve the state of democracy, too – since the more candidates know about their voters, the better they can understand their needs.