9gb5rt36 1446180027

The replication crisis has engulfed economics

No two alike? Image sourced from Shutterstock.com

The replication crisis has engulfed economics

No two alike? Image sourced from Shutterstock.com

A sense of crisis is developing in economics after two Federal Reserve economists came to the alarming conclusion that economics research is usually not replicable.

The economists took 67 empirical papers from 13 reputable academic journals. Without assistance from the original researchers they were only able to get the same result in a third of cases.

With the original researchers’ assistance, that percentage increased to about half, suggesting reporting practices and requirements are seriously deficient.

The replication crisis in psychology is well-documented. Science recently published a stunning report by the Open Science Collaboration. Almost 300 researchers were involved in trying to directly replicate the results of 100 papers published in 2008. This followed earlier exercises involving many labs (such as here, here and here.)

The researchers did not succeed in the clear majority of cases. On average they found the mean effect size to be only half of what was reported in the original studies. While the report has been questioned (here
and here,) there is growing concern that a cornerstone of the scientific edifice is in serious need of renovation.

What’s the problem?

Researchers are too often granted inappropriate degrees of freedom, and some are just fraudulent. But that said, some of these distressing replication results are because good science is messy. It involves hard work and reasonable people can reasonably disagree on the various calls that have to be made.

A good illustration is this just-published study by Raphael Silberzahn and Eric Uhlmann. The researchers engaged in methodological debates with well-known data sleuth Uri Simohnson.

Simohnson questioned the results of an earlier study from the pair that suggested noble-sounding German names could boost careers. Re-running the analysis with a better analytical approach, Simonsohn did not confirm the effect. Silberzahn and Uhlmann eventually conceded the point in a joint paper with Simonsohn.

In their new study, the researchers provided a data set and asked more than two dozen teams of researchers to contribute. They sought to determine, based on the data set, whether skin colour of soccer players from four major leagues (England, France, Germany, and Spain) influenced how often they were given a red card.

Somewhat shockingly, the answers were rather diverse. Of the 29 teams, 20 found a statistically significant correlation with the median, suggesting dark-skinned players were 1.3 times more likely than light-skinned players to be sent off.

But the researchers reported:

“Findings varied enormously, from a slight (and non-significant) tendency for referees to give more red cards to light-skinned players to a strong trend of giving more red cards to dark-skinned players.”

Interestingly, this diversity of results survived even after the researchers debated the methodological approach.

The upshot is that even under the best of circumstances – one data set, what seems like a straightforward question to answer, and an exchange of ideas on the best method – arriving at consensus can be extraordinarily difficult. And it surely becomes even more difficult with multiple data sets and many teams.

Further scrutiny

That, of course, is hardly news to most social scientists, who largely accept that any single study is worth only so much. This is why replication efforts and meta-analyses are as important as the recent focus on publication bias and underpowered studies. There is tantalising evidence that many experimental economics studies are severely under-powered (although the evidence so far has been established only for a very simple class of games).

It will be interesting to see the result of a current collaborative effort by economists to replicate eighteen laboratory economics studies from 2011 to 2014.

It is not just the social sciences that are in the grip of replication crises. The extent and consequences of p-hacking, and publication biases (studies that report no effect not being published) in science, are well-documented and have been known for a while.

So, where to from here? With a number of journals (including the Journal of the Economic Science Association, Experimental Economics, Journal of Experimental Social Psychology, Journal of Personality and Social Psychology, Psychological Science, Perspectives on Psychological Science) opening their doors to replication in various guises, we can expect more results to seemingly discredit the social sciences.

Hopefully in the long run it will up the ante on what it takes for a study to be reliable. Replication studies can inflict considerable damage on individuals’ productivity and reputation. There’s a need for minimal reporting standards and acceptable replication etiquette to be clarified, such as whether original authors have to be invited or consulted. Journals should become more serious about their data set collection efforts, when not prevented by confidentiality.