We need to talk about the data we give freely of ourselves online and why it’s useful

How should your social media data be accessed and used by researchers? Gil C/Shutterstock

We need to talk about the data we give freely of ourselves online and why it’s useful

The Cambridge Analytica and Facebook data harvesting scandal has provided yet another reminder of what has long been known: as social media users, we are the product.

Our personal information and behaviour that we divulge via social media use is valuable and used for commercial gain.

Yet those who do research using big data – and I count myself among these – are probably feeling both concerned and conflicted at this latest scandal.


Read more: Australia should strengthen its privacy laws and remove exemptions for politicians


Those of us academics who also have a private business interest in data analytics may even be having a “but there for the grace of God go I” moment.

Gathering Facebook data

The data used by Cambridge Analytica came from University of Cambridge researcher Aleksandr Kogan, and were not collected as part of his university work.

Kogan created a Facebook app which used Facebook’s “application programming interface” (API) to gather data on about 50 million people in the United States.

At the time there were many academics using the Facebook API in a similar way and, as has been pointed out by Facebook in the Cambridge Analytica case, there was no data breach (Facebook’s servers were not hacked).

Around the same time Kogan was collecting his data, I was using the Facebook API for research and teaching at the ANU. It was understood that the use of a Facebook app for research required ethics clearance and informed consent. Further, the collected data should be de-identified, stored securely and only used for the stated research project.

It is clear that these requirements were not met in the case of the Cambridge Analytica Facebook data.

It has been argued that it was not appropriate for Facebook apps to access the information of friends of the participants (the people who installed the app). But it was precisely the social network data (who are the participant’s friends, and how do they connect with one another?) that made Facebook data so useful for social research.

It is important to recognise that researchers using the Facebook API had to respect Facebook’s privacy settings – it was not possible to access profiles that were private or could only be viewed by friends.

Facebook restricted the API in 2014 to prevent this kind of collection. So a budding Aleksandr Kogan of 2018 would not be able to collect Facebook data that would be of interest to Cambridge Analytica.

But there are several reasons why this latest story is not simply “old news”.

A shifting, competitive environment

The Cambridge Analytica scandal highlights that social media companies such as Facebook are faced with often conflicting privacy-related demands from users and advertisers, as well as from civil society, academia and government.

MIT management professor Sinan Aral calls this the “transparency paradox”.

It is a quickly changing environment and what was considered ethical and appropriate five or ten years ago (such as the savvy use of social media by the Obama presidential campaign) may be regarded as unacceptable in the future.

This is just the natural process of technology evolving over time in response to public scrutiny.

But some of Facebook’s privacy missteps have appeared to be wilful, with the platform testing the water (and then apologising) in terms of what it could get away with to make itself more valuable to advertisers.


Read more: Consent and ethics in Facebook's emotional manipulation study


Taking risks in the WWW Wild West

Facebook operates in a highly competitive environment, as do the academics and entrepreneurs who want to make use of social media data. Some will always be more willing than others to take calculated risks in an attempt to leapfrog the competition.

The world of big data analysis is like the Wild West. If we don’t collect and analyse these data, then our competitors will (and they will get the grants, or the big contracts).

Anyway, the API may be turned off next year or the social media platform might go bust, so we had better get in quick.

I recently attended an academic presentation involving potentially sensitive social media data (not Facebook data) collected via an API. I was not the only person in the room shifting uneasily in my seat when we were told “everyone is doing it”.

Impact on future of access to big data

Researchers, including myself, who use big data will be concerned that the Cambridge Analytica scandal will contribute to making it even harder to access social media data for legitimate research.

But while the public APIs may be further restricted, social media companies will continue to use the data themselves and to give preferential access to affiliated university researchers.

Yet public APIs help to level the research playing field. They allow researchers from around the world, who are less likely to have any preferential access to social media companies, to conduct open science using publicly available data.

If I conduct research using data from a public API and you don’t agree with the results, you can use the same API to collect a similar dataset to try to prove me wrong.

Restricting API access will also make it harder for outside researchers to understand the privacy implications of the data being collected by Facebook and similar companies.

I am only able to write about the nature of the Cambridge Analytica scandal because the data were originally collected via the public Facebook API, which I was using at around the same time.

By further restricting public APIs, only the social media companies themselves will be able to conduct research about users’ behaviour on their platform. What is the implication for accountability and transparency, let alone research into important topics such as political filter bubbles and fake news?

How to govern our online social data?

These privacy concerns are particularly pertinent to those platforms that want “the real you”. That is, those where it is either against the Terms of Service or doesn’t make sense to create multiple or fake profiles. Facebook, Academia.edu and LinkedIn are prime examples.

But all social media platforms share a common feature that they are only valuable if they have significant market share. Because of the network effect, these platforms can grow very quickly and it is a winner-takes-all proposition. They will “move fast and break things” to get to number one.

Concerns have been raised about the enormous power that platforms such as Google and Facebook have as a result of controlling data on consumer preferences and search behaviour, and how this can reduce competition and innovation.


Read more: Regulating Facebook won't prevent data breaches


The Cambridge Analytica scandal will inevitably focus attention on the question of how we should govern our online social data. The European Union’s DECODE (Decentralised Citizen Owned Data Ecosystems) project is developing tools to give people control over how their data is used, and the ability to share it on their terms.

Social media platforms are walled gardens but data portability is one of the planks of the European Union’s General Data Protection Regulation, which will come into effect in May 2018. This is the right for an individual to require an organisation to give them back a copy of their personal data or to send this data to another organisation (potentially a competitor).

Maybe this is a step towards a world where users will be able to easily leave a social media platform if they don’t agree with how their data are being used, without suffering the social or career hit that would be associated with the only option available now: delete your account.