A January 2023 investigation by Time magazine revealed that Kenyan workers paid less than US$2 an hour were given the job of trying to ensure that the data used to train the AI platform ChatGPT was free from discriminatory content.
AI models need to be trained, with the input of an enormous critical mass of data, for them to learn to recognise and interact with the human environment. These inputs need to be collected, sorted, verified, and formatted. Such time-consuming and undervalued tasks are generally outsourced by technology companies to an army of precarious workers, usually based in the Global South.
This data work takes several different forms, depending on the purpose of the final algorithm. For example, it might involve outlining people in images captured on video camera to teach the algorithm how to recognise a human. Or one might be checking the outputs of an automatic invoice-processing tool, and correcting errors manually to help the computer in with its task.
To explore the identity of these data workers, their roles and working conditions, and enrich the debate around regulating the AI sector, we set up an investigation conducted between Paris and Antananarivo, capital of Madagascar.
Our study also shows the reality of AI, French style: one the one hand, France’s tech companies depend on the Big Five (Google, Apple, Facebook, Amazon and Microsoft) hosting services and processing power; on the other, data tasks performed by workers in former French colonies, notably Madagascar, confirming well-established trends in outsourcing. There has already been research, incidentally, comparing the tech sector with mining and textiles.
A study in AI globalisation
Our research project kicked off in Paris in March 2021. We first set out to understand what involvement French AI houses had in data work activity, and what processes were in place to ensure sufficiently high-quality data sets are produced for training computer models. We carried out interviews with 30 founders and employees working in 22 Parisian firms in the AI ecosystem. One finding rapidly emerged from this initial exploration – most of the data work was outsourced to Malagasy contractors.
For a second part of the study, conducted at first remotely, then in situ in Antananarivo, we interviewed 147 workers, managers and directors at ten Malagasy companies. At the same time we sent out a questionnaire to 296 data workers based in Madagascar.
Precarious work for well-educated city youth
Our initial enquiries showed that AI data workers were part of a much wider IT service sector, ranging from call-centre staff to web content moderators, to search-engine optimisation (SEO) copywriters.
Questionnaire responses showed that the majority of workers employed in the sector are male (68%), young (87% were less than 34 years old), urban-dwelling and educated (75% had gone through, attained or had had at least some higher education). When the work was within the formal, rather than the black or grey economy, respondents were generally permanent staffers. The minimal protections offered by Madagascan, as opposed to French employment law, workers’ ignorance of their rights, and the weakness of trade unions and worker representation in Malagasy companies heightened the precariousness of their position. They mostly earned between 96 and 126 euros a month, with a huge gap between their pay and that of team supervisors: who also tend to be Madagascan, working in-country, but taking home 8 to 10 times as much.
The shop-floor workers find themselves at the end of an extremely long outsourcing chain, which partly explains the miniscule pay even by Malagasy standards. The AI production line involves three different players: data hosting services/processing power offered by the Big Five tech companies, French companies that sell the AI models, and companies offering the data annotation services delivered by Madagascan workers. Each level takes its cut.
The companies wrangling the data are generally very dependent on their French clients, who manage the outsourced workforce in a quasi-direct manner, imposing middle-managers working with the interests of Parisian start-ups in mind. The domination of these roles by foreigners – either employed by the client companies in France, or expats working in Antananarivo – represents a serious block on career progression for the workers, who remain ignominiously stuck at the bottom of the value chain.
Profiting from post-colonial France-Madagascar links
The AI sector benefits from a specific policy – “tax-free zones” created in 1989 for the textile industry. Since the start of the 1990s, French businesses have been setting up satellites in Madagascar, notably for the digital-publishing industry. The special zones, the equivalent of which can be found in many other developing countries, pull in investment by offering highly attractive tax exemptions.
Today, out of 48 businesses offering digital services in the tax-free zones, only nine are owned by Madagascans, compared to 26 owned by French people. Aside from the situation with formally constituted companies, the sector has developed a practice of cascade subcontracting, with grey-economy businesses and entrepreneurs at the bottom of the pecking order, poorly treated and chivvied into action when there are workforce shortages elsewhere in the sector.
As well as cheap labour, this outsourced industry profits from a well-educated workforce – most have been to university and speak fluent French, which they learned at school, online or at Institut Francais classes. This latter institution of induction into French language and culture, set up in 1883, was originally intended to extend imperial power through language to the colonised population.
This scenario aligns with what researcher Jan Padios labels “colonial recall”. Former colonies with linguistic and cultural ties to countries that used to hold sway now supply them with business services.
Making AI workers visible to better understand how they function
Behind the recent explosion in commercialised AI projects in the Global North, one uncovers growing numbers of data workers. The recent controversy around “intelligent security cameras” at the Paris Olympics focused mostly on the ethics of blanket surveillance. There is a need to better account for the vital component of human labour that goes into training AI models, especially because it raises new questions about working conditions and right to a private life.
To make the roles of these workers visible is to ask probing questions about globalised production chains. These are more familiar in the manufacturing industry, but are also a feature of the digital sector. These workers are essential to the functioning of our digital infrastructure – they are the invisible cogs of our digital lives.
It also makes visible the impact of their work on the AI models. One part of algorithmic bias lies in the nature of how data work is conducted, although the reality of this is largely kept under wraps by AI companies. A truly ethical AI must therefore set ethical standards for AI sector working conditions.