It bears repeating: how scientists are addressing the ‘reproducibility problem’

Recently a friend of mine on Facebook posted a link whose headline quoted a scientist saying “Most cancer research is largely a fraud.” The quote is both out of context and many decades old. But its appearance still makes a strong point: the general public has a growing distrust of science and research.

Seeking reproducibility: a difference between scientists and normal people. Randall Munroe/XKCD, CC BY-NC

Recent reports in the Washington Post and the Economist, among others, raise the concern that relatively few scientists’ experimental findings can be replicated. This is worrying: replicating an experiment is a main foundation of the scientific method.

As scientists, we build on knowledge gained and published by others. We develop new experiments and questions based on the knowledge we gain from those published reports. If those papers are valid, our work is supported and knowledge advances.

On the other hand, if published research is not actually valid, if it can’t be replicated, it delivers only an incidental finding, not scientific knowledge. Any subsequent questions will either be wrong or flawed in important ways. Identifying which reports are invalid is critical to prevent wasting money and time pursuing an incorrect idea based on bad data. How can we know which findings to trust?

Why would a repeat fail?

Repeating a result is not always a simple task. Say you flip a coin three times and get heads each time. You may conclude that coins always land on heads. As an independent test, your friend flips a coin five more times and gets four tails and one heads. The friend concludes your results were incorrect, not reproducible and that coins usually land on tails. Repeating the research can both correct inaccuracies and deepen our understanding of the real truth: the coin lands on heads and tails equally.

This is much harder in studies that are more complex than coin-flipping. In a recent commentary in Science, lead author and Harvard psychologist Daniel Gilbert notes that the 2015 study that reported low reproducibility of psychology research did not correctly replicate the methods or approaches of the original studies. For example, a study of race and affirmative action performed at Stanford University was “replicated” at the University of Amsterdam in the Netherlands, in another country with different racial diversity. When the study was later repeated at Stanford, the original published results were indeed replicated.

Gilbert’s analysis suggests that the reproducibility “problem” may be more complex. Perhaps some studies cannot be repeated due to problems with the initial study, while others aren’t replicable because the follow-up research did not follow the methods or use the same tools as the original study. Likely both contribute to the reproducibility problem.

Focusing on the details

The scientific community is addressing this challenge in several ways. For example, scholarly journals are requiring much more detailed explanations of how we did our experiments. More detail allows scholars to better evaluate and understand what parts of the experiment could influence the result.

Also, when reviewing requests for government research grant money, the National Institutes of Health now requires scientists to detail both the tools they will use and the tests they used to confirm the tools are exactly what they should be.

One way scientists can get results that can’t be reproduced is if one or more of the tools used doesn’t work as the researchers assume or intend. Researchers have found that tools such as cell lines can become contaminated, mislabeled or mixed up. Antibodies used to identify one protein may actually identify the wrong protein or more than one protein. Even variations in the type of food given to lab mice have shown to significantly change experiment results.

To combat this type of problem, researchers have begun sequencing DNA to ensure they are working with the cell lines they intend to be. Some lab supply companies are testing their antibodies in-house to confirm they work as expected. Other companies are using the online lab-services marketplace Science Exchange to find expert labs like mine to independently test their antibodies. (I am on Science Exchange’s Lab Advisory Board, but have no financial interest in the company.) The results of those tests can “validate” an antibody as good or bad for a particular experiment, letting future scientists know which antibodies are the best tools for their research.

Finding time to reproduce important studies

Those steps address future and ongoing research. But how do we know which already published experiments are reproducible and which are not? Most journals focus on publishing new and groundbreaking findings, rather than publishing a replication of a previous study. Further, research that finds a study’s results can’t be replicated – getting what are called “negative results” – can also be difficult for scientists and journals to publish. Collaboration and support from colleagues are key to academic success; publishing data that contradict a fellow researcher’s results risks alienating peers.

In 2012, the biopharmaceutical company Amgen reported that it had been unable to reproduce 47 of 53 “landmark” cancer papers. For confidentiality reasons, however, the company did not release which papers it could not replicate and thus did not provide details about how it repeated the experiments. As with the psychology studies, this leaves the possibility that Amgen got different results because the experiments were not performed the same way as the original study. It opens the door to doubt about which result – the first or the repeat test – was correct.

Several initiatives are addressing this problem in multiple disciplines. Science Exchange; the Center for Open Science, a group dedicated to “openness, integrity and reproducibility of scientific research”; and F1000Research, a team focused on immediate and transparent publishing have all introduced initiatives along this line.

Science Exchange and the Center for Open Science have launched a specific effort in this direction regarding cancer research. Their effort, the Reproducibility Project: Cancer Biology, has received US$1.3 million from the Arnold Foundation to repeat selected experiments from a number of high-profile cancer biology papers. The project will publish comprehensive details of how scientists attempted to reproduce each study, and will report results whether they confirm, contradict or change the findings of the study being repeated.

In addition, Science Exchange, the open-access journal PLoS, the data management site figshare and the reference management site Mendeley joined forces in 2012 to identify and document high-quality reproducible research. This effort, called the Reproducibility Initiative, allows scientists to apply to have key parts of their projects repeated in independent expert labs identified by Science Exchange.

The results of the repeat tests can be published in the special PLoS reproducibility collection. The data are made openly available through figshare and the impact the work has on future studies and publications can be tracked in the Mendeley reproducibility collection. Many journals have agreed to add an “Independently Validated” badge to original articles that are successfully repeated, indicating their high quality.

Doing it right again and again

To prevent problems in the repetition of the experiments, the Reproducibility Initiative spends months reviewing the details of an experiment with the original author to ensure the project is repeated accurately. Once reviewed, Science Exchange splits the project into types of experiments and outsources each type to a lab with that expertise. By dividing and outsourcing the project, the testing labs do not know the original paper, results, or authors, eliminating chances for bias in testing.

Testing labs like mine create a detailed report of the experiments to be done. Every step, every reagent down to the catalog number and company, is carefully documented and published in an independent report in “PLoS One.” That way, whether the result of the repetition is positive or negative, the full details of the experiment are available for review. Upon completion of the repeat testing, the results are published in “PLoS One,” whether they validate or contradict the original findings. The results of the first full replication of a study are expected to be published later this year.

As scientists, we are working to dispel concerns about scientific research like those raised by my Facebook friend. With improved reporting and tools for future research, the science community can counter and reduce existing problems of reproducibility, which will help us build a strong and valid foundation for future scientific studies.

It bears repeating: how scientists are addressing the ‘reproducibility problem’

Author

Disclosure statement

Partners

Why would a repeat fail?

Focusing on the details

Finding time to reproduce important studies

Doing it right again and again

Want to write?