The 2019 Nobel Prize in Economics was awarded to Esther Duflo, Abijit Banerjee and Michael Kremer for their work adapting the method of randomized control trials (RCTs) to the field of development. The jury believes this new type of experimentation has “considerably improved our ability to fight global poverty” and “transformed development economics”. There are reasons to applaud this decision – not only is one of the three winners a woman, the prize recognizes the importance of economic development and of an empirical approach to fieldwork.
However, the validity and impact of the growing use of randomized control trials require scrutiny. Working from a July 2019 article, we would like to reaffirm our reservations. While RCTs have many advantages, claiming to be able to use them for the entire gamut of development interventions is deeply problematic.
In an RCT, two groups are randomly selected from a homogeneous population: the first receives an “intervention” (medicine, grant, loan, training, etc.), while the second gets a “placebo” – either a different intervention or no intervention at all. After a certain time, the two groups are evaluated to compare the efficacy of the intervention or analyze two distinct approaches. While controversial, this method has been widely used in the medical field since the mid-20th century, and has since been applied to fields such as education, crime and tax reform, particularly in the United States in the ‘60s, ‘70s and ‘80s.
Over the last 15 years, randomized control trials have been applied to a new field: development aid policy. A vast range of interventions have been put to the test of “randomization”, especially in education (incentives aimed at reducing absenteeism among teachers, de-worming medicine designed to improve student attendance), health (water filters, mosquito nets, training or bonus systems for healthcare workers, free consultations, medical advice via text messages, etc.), financing (microcredit, micro-insurance, savings, financial education), and governance.
A supposed monopoly on scientific diligence
RCTs are described by their proponents as a revolutionary paradigm shift, as seen in the book by Esther Duflo and Abijit Banerjee, Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty and in their public statements. Moreover, those in the political and economic spheres tend to attribute the labels “rigorous” and even “scientific”, exclusively to these kinds of trials.
As randomized control trials have become increasingly dominant, they have had a crowding-out effect on other approaches. At the World Bank, from 2000 to 2010 just 20% of all evaluations were RCTs. In the five years that followed, the ratio was practically reversed, and this trend has been mirrored at 3IE, the international network specializing in evaluation.
Is this crowding-out effect scientifically legitimate and politically desirable?
From theory to practice
Be it of a project, policy or program, all impact evaluations face the same challenge: how can one isolate the impact of the intervention from changes arising from outside sources? Several methods are available but, in theory, RCTs have an indisputable advantage: randomized selection from large sample sizes, in principle and on average, ensures that all the differences measured between the two groups are due to the intervention, and nothing else.
When it comes to answering basic questions about development, however, RCTs are hardly effective for at least three reasons:
Their external validity is weak, meaning they are extremely localized and rely on samples that do not represent the population as a whole. Their results are therefore difficult to generalize. With this method, it is impossible to know if results obtained in a rural area of Morocco would apply in another area of the country, or in Tunisia, or Bolivia for example. This limitation is widely acknowledged and accepted but, in practice, few take it into account.
Contrary to a common claim, the internal validity of RCTs is also problematic. Their capacity to measure the impact of an intervention is imperfect. As was demonstrated by the 2016 economics Nobel Prize laureate Angus Deaton and his epistemologist colleague, Nancy Cartwright, RCTs are ill-equipped to strike the right balance between bias (which must be minimized) and precision (which must be maximized), and therefore tend to focus on average results for an entire given population. Yet, the impacts of the policies under study are often heterogeneous, and heterogeneity is decisive in public policy. Furthermore, the implementation of study protocols is hampered by numerous practical and ethical difficulties, to the extent that comparisons between the population receiving the intervention and the control population are often skewed.
RCTs often involve a range of stakeholders with interests that are sometimes in conflict. Their interplay influences every stage of the trial: the technical protocols, their implementation, the analysis of the results, and their publication and distribution. Here, again, arrangements are made to the detriment of scientific rigor. RCTs become political areas, with interests at play involving government re-election, (e.g. the evaluation of an anti-poverty program in Mexico), the dominant discourse on certain development tools (e.g. micro-insurance) and their reputation, in some cases determined by RCT advocates (e.g. the controversy surrounding de-worming) and, sometimes, the publishing imperatives concerning research studies…
We recently replicated a randomized control trial conducted by Esther Duflo and her colleagues on microcredit in Morocco. This kind of exercise is vital for ensuring research reliability, and consists of using the raw data from the survey to try to reproduce the study’s results. We were able to do so, which is good news, but we also uncovered numerous problems and errors, some of which seriously undermined the study’s internal and external validity.
The sampling was so different from the original protocol that it was impossible to characterize the population studied and understand the representativeness of the results.
The gender and ages of the members of the households surveyed varied so widely before and after the intervention that, in 20% of cases, they could not possibly be the same households.
Estimations of the assets owned by the households were incoherent, in spite of the fact that these estimations are a key variable for evaluating the economic impact of the program.
Although the area of the survey was supposed to have been free of credit prior to the intervention and the control area was supposed to remain credit-free during the study, this was not the case.
The researchers arbitrarily decided to remove 27 households, with higher values on certain variables, from the dataset prior to analysis (0.5% of the total). If just 12 more or 12 fewer households had been removed (0.3% or 0.7% of the total), the results would have been completely different.
Our replication gave rise to discussions with Duflo and her colleagues in the form of working documents. These discussions reveal profound divergences on what constitutes scientific validity for a field study. We believe our peers must examine this question more carefully.
Behind the success of RCTs
Ultimately, the kinds of interventions that can be evaluated using RCTs are limited, amounting to just 5%, according to the British development agency. Restricting the scope of impact studies to those interventions likely to conform to the standards of randomization not only excludes many projects, but also numerous fundamental aspects of development, both economic and political, such as regulating large companies, tax and international trade, to name but a few.
So what lies behind the success of RCTs? It is not always the scientific superiority of a method or theory that explains its success but, rather, the ability of its advocates to convince a sufficient number of stakeholders at a specific time. In other words, success arises from both supply and demand. On the demand side, the success of RCTs relates to changes in economics as a discipline, including the recent emphasis on quantification, the micro origins of macro processes and, within these micro origins, the psychological and cognitive motivations behind individual behaviours.
The success of RCTs also illustrates changes in the area of development aid, where we are seeing increasing numbers of small projects aimed at correcting individual behaviours rather than setting up or maintaining development infrastructure and national development policies.
The supply side has largely been shaped by a new brand of scientific entrepreneurs who use numerous strategies in an attempt to “corner the market”. These researchers are young and come from a small group of (mainly American) top universities. They have managed to combine the magic formula of academic excellence (scientific legitimacy), the ability to win over public opinion (media visibility, compassionate mobilization and moral commitment) and donors (solvent demand), massive investment in training (qualified supply), and an effective business model (financial profitability). All these qualities mutually reinforce each other.
Applying RCTs to development could lead to scientific breakthroughs, as long as their (many) limits and (narrow) scope are acknowledged. Claiming to be able to solve poverty with this kind of method, as do some of its advocates, including the three Nobel laureates, is a step backward on two fronts: epistemological, since this claim demonstrates an outdated positivist view of science; and political, since questions central to understanding the fight against poverty and inequality aren’t addressed with this approach.
Will this award lead the randomizers of development to be more measured about the benefits of different methods or, on the contrary, will they use this opportunity to consolidate their largely dominant position? There are good reasons for concern.
Florent Bédécarrats contributed to this article.
Translated from the French by Alice Heathwood for Fast ForWord.