tag:theconversation.com,2011:/us/topics/p-hacking-36789/articlesP-hacking – The Conversation2018-10-01T04:18:53Ztag:theconversation.com,2011:article/1038292018-10-01T04:18:53Z2018-10-01T04:18:53ZRetraction of a journal article doesn’t make its findings false<figure><img src="https://images.theconversation.com/files/238595/original/file-20181001-18991-jgdgpp.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Wansink's research showed plate size matters when it comes to how much we eat.</span> <span class="attribution"><a class="source" href="https://unsplash.com/photos/0OZz-OB65DA">rawpixel/Unsplash</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span></figcaption></figure><p>The American Medical Association recently <a href="https://jamanetwork.com/journals/jama/fullarticle/2703449">retracted six papers</a> co-authored by food consumption and psychology researcher, Brian Wansink, in three of its journals. These <a href="https://jamanetwork.com/journals/jama/article-abstract/200673">studies</a> <a href="https://jamanetwork.com/journals/jamapediatrics/article-abstract/717915">include two</a> showing that large bowl sizes encourage us to eat more, and that <a href="https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/1685889">shopping when hungry</a> leads us to buy more calorie-dense foods.</p>
<p>A prolific academic researcher, Wansink has provided many thought-provoking ideas about the psychology of food consumption through more than 500 publications which have been collectively cited more than <a href="https://scholar.google.com/citations?user=5yV3t8oAAAAJ&hl=en">25,000 times</a>. </p>
<p>His research has shown that people will eat a lot more from a <a href="https://www.ncbi.nlm.nih.gov/pubmed/15761167">bottomless soup bowl</a>; they will eat more from larger portions, even if it is <a href="https://www.ncbi.nlm.nih.gov/pubmed/?term=wansink+stale+popcorn">stale popcorn</a> or food served in a <a href="https://www.ncbi.nlm.nih.gov/pubmed/20709127">dark restaurant</a>; and they will eat less if a portion is made to appear larger using <a href="https://theconversation.com/use-your-illusion-how-to-trick-yourself-and-others-into-eating-less-31304">visual illusions</a>.</p>
<p>Retractions are a <a href="https://bmjopen.bmj.com/content/6/11/e012047">permanent means</a> by which journals endeavour to preserve the integrity of scientific literature. They are typically issued for some form of misconduct, but it does not necessarily mean the results are false. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/use-your-illusion-how-to-trick-yourself-and-others-into-eating-less-31304">Use your illusion: how to trick yourself and others into eating less</a>
</strong>
</em>
</p>
<hr>
<h2>Are retracted studies false?</h2>
<p>A number of challenges have been made against more than 50 of Wansink’s publications. At present, <a href="http://www.timvanderzee.com/the-wansink-dossier-an-overview/">15 corrections</a> have been published and <a href="https://www.vox.com/science-and-health/2018/9/19/17879102/brian-wansink-cornell-food-brand-lab-retractions-jama">13 retractions</a> have been made.</p>
<p>The retractions follow a <a href="http://www.timvanderzee.com/the-wansink-dossier-an-overview/">range of allegations</a> of misconduct including autoplagiarism (copying your own work), data mismanagement and data manipulation. But none of this means Wansink’s results are entirely discredited.</p>
<p>The American Medical Association made its retractions based on Cornell University (Wansink’s employer) being unable to provide an independent evaluation in response to an <a href="https://jamanetwork.com/journals/jama/fullarticle/2678649">Expression of Concern</a> regarding Wansink’s studies issued in May.</p>
<p>The absence of evidence does not prove his results are false. </p>
<p>Science relies far more on whether results are repeatable than retractions. And many of Wansink’s results – including some which have been retracted – have been replicated. </p>
<p>Two of the most recently retracted studies showing that <a href="https://jamanetwork.com/journals/jama/fullarticle/200673">adults</a> and <a href="https://jamanetwork.com/journals/jamapediatrics/article-abstract/717915">children</a> eat more from larger bowls form a part of a larger literature and have been cited <a href="https://scholar.google.com.au/scholar?cites=9143302904972518841&as_sdt=2005&sciodt=0,5&hl=en">nearly 300 times</a> and <a href="https://scholar.google.com.au/scholar?cites=2685318782292980698&as_sdt=2005&sciodt=0,5&hl=en">40 times</a> respectively.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/238599/original/file-20181001-19003-1h7vcy5.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">The bigger the plate size, the more people will eat (if they serve themselves).</span>
<span class="attribution"><a class="source" href="https://unsplash.com/photos/kbcqR60zWeo">NeONBRAND/Unsplash</a></span>
</figcaption>
</figure>
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/25049139">Multiple</a> <a href="https://www.tandfonline.com/doi/abs/10.1080/10408398.2014.922044">reviews</a> of the scientific literature reveal that others have replicated the findings of Wansink and colleagues on how the plate or bowl size affects consumption. </p>
<p>In a <a href="https://www.journals.uchicago.edu/doi/10.1086/684441">meta-analysis</a> I authored with others, the combined studies in this area show that doubling the plate size increases consumption by 40% on average. Though this is only the case if people are serving food onto the plate themselves. (Disclosure: this meta-analysis was published in a journal issue for which Wansink was one of the editors).</p>
<h2>Replication is more important than retraction</h2>
<p>The problem of reproducing findings in science is a much bigger issue than retractions. Retractions attract attention, but are relatively minor; replication does not attract attention, and is critically important.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Replication_crisis#Scope_of_the_crisis">replication crisis</a> facing social sciences, health and medicine suggests that <a href="https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124">50% or more of published findings</a> may not be repeatable.</p>
<p>In social science, a team replicated 100 studies published in three high-ranking journals. <a href="http://science.sciencemag.org/content/349/6251/aac4716">The results</a> showed only 36% of the replications found statistically significant results, and the average size of the observed effects was half of that seen in the original studies.</p>
<p>Wansink has published more than 500 articles. If 250 of them prove to be false in the sense that the results cannot be replicated, then he is on par with social and medical science in general. </p>
<p>The retraction of thirteen of Wansink’s articles - some of which have been replicated by others - is a blip receiving much more attention than it deserves.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198">The science 'reproducibility crisis' – and what can be done about it</a>
</strong>
</em>
</p>
<hr>
<p>The high rate of replication failure arises, in part, from the arcane <a href="https://msu.edu/%7Elevinet/NHST1.pdf">statistical approach</a> used for analysing research data. <a href="https://theconversation.com/one-reason-so-many-scientific-studies-may-be-wrong-66384">In essence</a>, researchers seek statistically significant findings. Statistical significance is typically defined as when the probability (p-value) of the observed data <em>assuming there was no effect</em> is less than 5%.</p>
<p>Journals and academics wish to <a href="https://en.wikipedia.org/wiki/Publication_bias">publish</a> novel, statistically significant results. They tend to ignore studies with null results, putting them in <a href="http://psycnet.apa.org/record/1979-27602-001">a file-drawer</a>. </p>
<p>Replications that are successful add nothing new, and replications that fail (not statistically significant) are uninteresting to publishers albeit critically important to science.</p>
<p>A related problem is that academics may dredge through data and cherry pick statistically significant results, a practice called <a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106">p-hacking</a>.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/one-reason-so-many-scientific-studies-may-be-wrong-66384">One reason so many scientific studies may be wrong</a>
</strong>
</em>
</p>
<hr>
<p>The misconduct of journals and academics through their obsessive focus on statistically significant findings <a href="https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106">is widespread</a>. If Wansink differs from others, it is in his disarming honesty admitting to data dredging in a 2016 <a href="https://web.archive.org/web/20170312041524/http:/www.brianwansink.com/phd-advice/the-grad-student-who-never-said-no">blog post</a> which attracted intensive scrutiny from his peers.</p>
<p>Science makes mistakes and missteps. The advances are achieved through new ideas and repeated testing. </p>
<p>Retractions may be important signals of reduced confidence in a finding, but they do not prove a finding false. This requires replication.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/oh-the-uncertainty-how-do-we-cope-32155">Oh, the uncertainty: how do we cope?</a>
</strong>
</em>
</p>
<hr>
<p>Science doesn’t provide certainty. Claims of absolute certainty made by authoritative figures are probably false.</p>
<p>As Tim van der Zee, one of Wansink’s lead detractors states on <a href="http://www.timvanderzee.com/the-wansink-dossier-an-overview/">his website</a> “I am wrong most of the time.” The challenge for scientists is to believe this.</p><img src="https://counter.theconversation.com/content/103829/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Stephen S Holden does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>The journal of the American Medical Association (JAMA) recently retracted several papers by a leading researcher on food and consumption. What does this mean for the researcher’s findings?Stephen S Holden, Adjunct Professor, Macquarie Graduate School of ManagementLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/944212018-04-09T20:05:37Z2018-04-09T20:05:37ZOur survey found ‘questionable research practices’ by ecologists and biologists – here’s what that means<figure><img src="https://images.theconversation.com/files/213332/original/file-20180405-189821-oqdb0h.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Negative results are still useful, and should not be hidden. </span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/closeup-old-dirty-school-blackboard-stains-1060705337?src=H_2XPJ_5Q3o7mSADUVReww-1-49">from www.shutterstock.com </a></span></figcaption></figure><p>Cherry picking or hiding results, excluding data to meet statistical thresholds and presenting unexpected findings as though they were predicted all along – these are just some of the “questionable research practices” implicated in the <a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">replication crisis</a> psychology and medicine have faced over the last half a decade or so.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">Science is in a reproducibility crisis – how do we resolve it?</a>
</strong>
</em>
</p>
<hr>
<p>We recently surveyed more than 800 ecologists and evolutionary biologists and found high rates of many of these practices. We believe this to be first documentation of these behaviours in these fields of science.</p>
<p>Our pre-print <a href="https://osf.io/7qbfv/">results</a> have certain shock value, and their release attracted a lot of attention on social media.</p>
<ul>
<li><p>64% of surveyed researchers reported they had <em>at least once</em> failed to report results because they were not statistically significant (cherry picking)</p></li>
<li><p>42% had collected more data after inspecting whether results were statistically significant (a form of “<a href="https://theconversation.com/how-we-edit-science-part-2-significance-testing-p-hacking-and-peer-review-74547">p hacking</a>”)</p></li>
<li><p>51% reported an unexpected finding as though it had been hypothesised from the start (known as “HARKing”, or Hypothesising After Results are Known).</p></li>
</ul>
<p>Although these results are very similar to those that have been found in <a href="https://www.psychologicalscience.org/news/releases/questionable-research-practices-surprisingly-common.html">psychology</a>, reactions suggest that they are surprising – at least to some ecology and evolution researchers. </p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"976399658107928581"}"></div></p>
<p>There are many possible interpretations of our results. We expect there will also be many misconceptions about them and unjustified extrapolations. We talk though some of these below. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/how-we-edit-science-part-2-significance-testing-p-hacking-and-peer-review-74547">How we edit science part 2: significance testing, p-hacking and peer review</a>
</strong>
</em>
</p>
<hr>
<h2>It’s fraud!</h2>
<p>It’s not fraud. Scientific fraud involves fabricating data and carries <a href="https://theconversation.com/research-fraud-the-temptation-to-lie-and-the-challenges-of-regulation-58161">heavy criminal penalties</a>. The questionable research practices we focus on are by definition questionable: they sit in a grey area between acceptable practices and scientific misconduct.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Not crazy. Not kooky. Scientists are just humans.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/crazy-chemistry-professor-injecting-lab-mouse-1017085096?src=52qpgUr9QmdeiZiAUN7-eA-3-51">from www.shutterstock.com</a></span>
</figcaption>
</figure>
<p>We did ask one question about fabricating data and the answer to that offered further evidence that it is very rare, <a href="https://theconversation.com/clearing-the-air-why-more-retractions-are-good-for-science-6008">consistent with findings from other fields</a>.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/research-fraud-the-temptation-to-lie-and-the-challenges-of-regulation-58161">Research fraud: the temptation to lie – and the challenges of regulation</a>
</strong>
</em>
</p>
<hr>
<h2>Scientists lack integrity and we shouldn’t trust them</h2>
<p>There are a few reasons why this should not be the take home message of our paper. </p>
<p>First, reactions to our results so far suggest an engaged, mature scientific community, ready to acknowledge and address these problems. </p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"976402383965179904"}"></div></p>
<p>If anything, this sort of engagement should increase our trust in these scientists and their commitment to research integrity.</p>
<p>Second, the results tell us much more about <a href="https://theconversation.com/publish-or-perish-culture-encourages-scientists-to-cut-corners-47692">structured incentives and institutions</a> than they tell us about individuals and their personal integrity. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/publish-or-perish-culture-encourages-scientists-to-cut-corners-47692">Publish or perish culture encourages scientists to cut corners</a>
</strong>
</em>
</p>
<hr>
<p>For example, these results tell us about the institution of scientific publishing, where negative (non statistically significant results) are all but banished from most journals in most fields of science, and where replication studies are virtually never published because of relentless focus on novel, “ground breaking” results. </p>
<p>The survey results tells us about scientific funding, again where “<a href="https://theconversation.com/novelty-in-science-real-necessity-or-distracting-obsession-84032">novel</a>” (meaning positive, significant) findings are valued more than careful, cautious procedures and replication. They also tell us about universities, about the hiring and promotion practices within academic science that focus on publication metrics and overvalue quantity at the expense of quality. </p>
<p>So what do they mean, these questionable research practices admitted by the scientists in our survey? We think they’re best understood as the inevitable outcome of publication bias, funding protocols and an ever increasing pressure to publish.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/novelty-in-science-real-necessity-or-distracting-obsession-84032">Novelty in science – real necessity or distracting obsession?</a>
</strong>
</em>
</p>
<hr>
<h2>We can’t base important decisions on current scientific evidence</h2>
<p>There’s a risk our results will feed into a view that our science is not policy ready. In many areas, such as health and the environment, this could be very damaging, even disastrous. </p>
<p>One reason it’s unwarranted is that climate science is a model based science, and there have been many independent replications of these models. Similarly with immunisation trials. </p>
<p>We know that any criticism of scientific practice runs a risk in the context of <a href="https://theconversation.com/who-are-you-calling-anti-science-how-science-serves-social-and-political-agendas-74755">anti-science sentiment</a>, but such criticism is fundamental to the success of science. </p>
<p>Remaining open to criticism is science’s most powerful self-correction mechanism, and ultimately what makes the scientific evidence base trustworthy.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Transparency can build trust in science and scientists.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/story-psychologists-office-778256944?src=WvTBg66nmYGI5c9uW4gkaA-7-94">from www.shutterstock.com</a></span>
</figcaption>
</figure>
<h2>Scientists are human and we need safeguards</h2>
<p>This is an interpretation we wholeheartedly endorse. Scientists are human and subject to the same suite of cognitive biases – like <a href="https://theconversation.com/confirmation-bias-a-psychological-phenomenon-that-helps-explain-why-pundits-got-it-wrong-68781">confirmation bias</a> – as the rest of us.</p>
<p>As we learn more about cognitive biases and how best to mitigate them in different circumstances, we need to feed this back into the norms of scientific practice. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/confirmation-bias-a-psychological-phenomenon-that-helps-explain-why-pundits-got-it-wrong-68781">Confirmation bias: A psychological phenomenon that helps explain why pundits got it wrong</a>
</strong>
</em>
</p>
<hr>
<p>The same is true of our knowledge about how people function under different incentive structures and conditions. This is the basis of many of the initiatives designed to make science more open and transparent.</p>
<p>The <a href="https://cos.io/our-products/osf/">open science movement</a> is about developing <a href="https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198">initiatives</a> to protect against the influence of cognitive bias, and alter the incentive structures so that research using these questionable research practices stops being rewarded. </p>
<p>Some of these initiatives have been enthusiastically adopted by many scientists and journal editors. For example, many journals now publish analysis code and data along with their articles, and many have signed up to <a href="https://osf.io/9f6gx/">Transparency and Openness Promotion (TOP) guidelines</a>. </p>
<p>Other initiatives offer great promise too. For example, <a href="https://www.elsevier.com/reviewers-update/story/innovation-in-publishing/registered-reports-a-step-change-in-scientific-publishing">registered report</a> formats are now offered by some journals, mostly in psychology and medical fields. In a registered report, articles are reviewed on the strength of their underlying premise and approach, before data is collected. This removes the temptation to select only positive results or to apply different standards of rigour to negative results. In short, it thwarts publication bias.</p>
<p>We hope that by drawing attention to the prevalence of questionable research practices, our research will encourage support of these initiatives, and importantly, encourage institutions to support researchers in their own efforts to align their practice with their scientific values.</p><img src="https://counter.theconversation.com/content/94421/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Fiona Fidler receives funding from the ARC and IARPA. She is an ambassador for the Centre for Open Science.</span></em></p><p class="fine-print"><em><span>Hannah Fraser has received funding from the Australian Research Council and National Environmental Research Program. She is an open science ambassador associated with the Centre for Open Science. </span></em></p>Questionable research practices are not fraud, and they’re not cause for panic. But they do give us some hints about how we can make science more robust.Fiona Fidler, Associate Professor, School of Historical and Philosophical Studies, The University of MelbourneHannah Fraser, Postdoctoral Researcher , The University of MelbourneLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/848962017-10-18T23:33:23Z2017-10-18T23:33:23ZA statistical fix for the replication crisis in science<figure><img src="https://images.theconversation.com/files/190686/original/file-20171017-30394-1pcijw3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Many scientific studies aren't holding up in further tests.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/closeup-portrait-tired-young-woman-scientistcrashing-677843290?src=LSedwluRRX5MZ_HcDBqbmA-1-0">A and N photography/Shutterstock.com</a></span></figcaption></figure><p>In a trial of a new drug to cure cancer, 44 percent of 50 patients achieved remission after treatment. Without the drug, only 32 percent of previous patients did the same. The new treatment sounds promising, but is it better than the standard?</p>
<p>That question is difficult, so statisticians tend to answer a different question. They look at their results and compute something called a p-value. If the p-value is less than 0.05, the results are “statistically significant” – in other words, unlikely to be caused by just random chance.</p>
<p>The problem is, many statistically significant results <a href="http://www.amstat.org/asa/files/pdfs/P-ValueStatement.pdf">aren’t replicating</a>. A treatment that shows promise in one trial doesn’t show any benefit at all when given to the next group of patients. This problem has become so severe that <a href="http://www.nature.com/news/psychology-journal-bans-p-values-1.17001">one psychology journal actually banned p-values</a> altogether. </p>
<p>My colleagues and I have studied this problem, and we think we know what’s causing it. The bar for claiming statistical significance is simply too low. </p>
<h1>Most hypotheses are false</h1>
<p>The Open Science Collaboration, a nonprofit organization focused on scientific research, <a href="http://dx.doi.org/10.1126/science.aac4716">tried to replicate</a> 100 published psychology experiments. While 97 of the initial experiments reported statistically significant findings, only 36 of the replicated studies did. </p>
<p><a href="http://dx.doi.org/10.1080/01621459.2016.1240079">Several graduate students and I</a> used these data to estimate the probability that a randomly chosen psychology experiment tested a real effect. We found that only about 7 percent did. In a similar study, <a href="http://dx.doi.org/10.1073/pnas.1516179112">economist Anna Dreber and colleagues</a> estimated that only 9 percent of experiments would replicate. </p>
<p>Both analyses suggest that only about one in 13 new experimental treatments in psychology – and probably many other social sciences – will turn out to be a success. </p>
<p>This has important implications when interpreting p-values, particularly when they’re close to 0.05. </p>
<h1>The Bayes factor</h1>
<p>P-values close to 0.05 are more likely to be due to random chance than most people realize.</p>
<p>To understand the problem, let’s return to our imaginary drug trial. Remember, 22 out of 50 patients on the new drug went into remission, compared to an average of just 16 out of 50 patients on the old treatment. </p>
<p>The probability of seeing 22 or more successes out of 50 is 0.05 if the new drug is no better than the old. That means the p-value for this experiment is statistically significant. But we want to know whether the new treatment is really an improvement, or if it’s no better than the old way of doing things.</p>
<p>To find out, we need to combine the information contained in the data with the information available before the experiment was conducted, or the “prior odds.” The prior odds reflect factors that are not directly measured in the study. For instance, they might account for the fact that in 10 other trials of similar drugs, none proved to be successful.</p>
<p>If the new drug isn’t any better than the old drug, then statistics tells us that the probability of seeing exactly 22 out of 50 successes in this trial is 0.0235 – relatively low. </p>
<p>What if the new drug actually is better? We don’t actually know the success rate of the new drug, but a good guess is that it’s close to the observed success rate, 22 out of 50. If we assume that, then the probability of observing exactly 22 out of 50 successes is 0.113 – about five times more likely. (Not nearly 20 times more likely, though, as you might guess if you knew the p-value from the experiment was 0.05.)</p>
<p>This ratio of the probabilities is called the Bayes factor. We can use <a href="https://yalebooks.yale.edu/book/9780300188226/theory-would-not-die">Bayes theorem</a> to combine the Bayes factor with the prior odds to compute the probability that the new treatment is better. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=600&fit=crop&dpr=1 600w, https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=600&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=600&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=754&fit=crop&dpr=1 754w, https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=754&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/190907/original/file-20171018-32341-1i70yms.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=754&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">What’s the probability of observing success in 50 trials? The blue curve represents probabilities under the ‘null hypothesis,’ when the new treatment is no better than the old. The red curve represents probabilities when the new treatment is better. The shaded area represents the p-value. In this case, the ratio of the probabilities assigned to 22 successes is A divided by B, or 0.21.</span>
<span class="attribution"><span class="source">Valen Johnson</span>, <a class="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA</a></span>
</figcaption>
</figure>
<p>For the sake of argument, let’s suppose that only 1 in 13 experimental cancer treatments will turn out to be a success. That’s close to the value we estimated for the psychology experiments.</p>
<p>When we combine these prior odds with the Bayes factor, it turns out that the probability the new treatment is no better than the old is at least 0.71. But the statistically significant p-value of 0.05 suggests exactly the opposite!</p>
<h1>A new approach</h1>
<p>This inconsistency is typical of many scientific studies. It’s particularly common for <a href="http://dx.doi.org/10.1073/pnas.1313476110">p-values around 0.05</a>. This explains why such a high proportion of statistically significant results do not replicate. </p>
<p>So how should we evaluate initial claims of a scientific discovery? In September, <a href="https://www.nature.com/articles/s41562-017-0189-z">my colleagues and I</a> proposed a new idea: Only P-values less than 0.005 should be considered statistically significant. P-values between 0.005 and 0.05 should merely be called suggestive.</p>
<p>In our proposal, statistically significant results are more likely to replicate, even after accounting for the small prior odds that typically pertain to studies in the social, biological and medical sciences. </p>
<p>What’s more, we think that statistical significance should not serve as a bright-line threshold for publication. Statistically suggestive results – or even results that are largely inconclusive – might also be published, based on whether or not they reported important preliminary evidence regarding the possibility that a new theory might be true. </p>
<p>On Oct. 11, we presented this idea to a group of statisticians at the ASA Symposium on Statistical Inference in Bethesda, Maryland. Our goal in changing the definition of statistical significance is to restore the intended meaning of this term: that data have provided substantial support for a scientific discovery or treatment effect.</p>
<h1>Criticisms of our idea</h1>
<p>Not everyone agrees with our proposal, including another <a href="https://dx.doi.org/10.17605/OSF.IO/9S3Y6">group of scientists</a> led by psychologist Daniel Lakens. </p>
<p>They argue that the definition of Bayes factors is too subjective, and that researchers can make other assumptions that might change their conclusions. In the clinical trial, for example, Lakens might argue that researchers could report the three-month rather than six-month remission rate, if it provided stronger evidence in favor of the new drug. </p>
<p>Lakens and his group also feel that the estimate that only about one in 13 experiments will replicate is too low. They point out that this estimate does not include effects like <a href="https://doi.org/10.1371/journal.pbio.1002106">p-hacking</a>, a term for when researchers repeatedly analyze their data until they find a strong p-value. </p>
<p>Instead of raising the bar for statistical significance, the Lakens group thinks that researchers should set and justify their own level of statistical significance before they conduct their experiments.</p>
<p>I disagree with many of the Lakens group’s claims – and, from a purely practical perspective, I feel that their proposal is a nonstarter. Most scientific journals don’t provide a mechanism for researchers to record and justify their choice of p-values before they conduct experiments. More importantly, allowing researchers to set their own evidence thresholds doesn’t seem like a good way to improve the reproducibility of scientific research. </p>
<p>Lakens’s proposal would only work if journal editors and funding agencies agreed in advance to publish reports of experiments that haven’t been conducted based on criteria that scientists themselves have imposed. I think this is unlikely to happen anytime in the near future.</p>
<p>Until it does, I recommend that you not trust claims from scientific studies based on p-values near 0.05. Insist on a higher standard.</p><img src="https://counter.theconversation.com/content/84896/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Valen E. Johnson does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Scientists have a big problem: Many psychological studies don’t hold up to scrutiny. Is it time to redefine statistical significance?Valen E. Johnson, University Distinguished Professor and Department Head of Statistics, Texas A&M UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/745472017-03-20T19:21:39Z2017-03-20T19:21:39ZHow we edit science part 2: significance testing, p-hacking and peer review<figure><img src="https://images.theconversation.com/files/161076/original/image-20170316-20811-2j81t4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">What's the p-value for that happening?</span> <span class="attribution"><span class="source">CERN</span></span></figcaption></figure><p><em>We take science seriously at The Conversation and we work hard at reporting it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by Australian Science & Technology Editor, Tim Dean. We thought you would also find it useful.</em></p>
<hr>
<p>One of the most common approaches to conducting science is called “significance testing” (sometimes called “hypothesis testing”, but that can lead to confusion for convoluted <a href="https://en.wikipedia.org/wiki/Statistical_hypothesis_testing">historical reasons</a>). It’s not used in all the sciences, but is particularly common in fields like biology, medicine, psychology and the physical sciences.</p>
<p>It’s popular, but it’s not without its flaws, such as allowing careless or dishonest researchers to abuse it to yield dubious yet compelling results.</p>
<p>It can also be rather confusing, not least because of the role played by the dreaded <a href="http://psc.dss.ucdavis.edu/sommerb/sommerdemo/stat_inf/null.htm">null-hypothesis</a>. It’s a bugbear of many a science undergraduate, and possibly one of the most misunderstood concepts in scientific methodology.</p>
<p>The null-hypothesis is just a baseline hypothesis that typically says there’s nothing interesting going on, and the causal relationship underpinning the scientist’s hypothesis doesn’t hold.</p>
<p>It’s like a default position of scepticism about the scientist’s hypothesis. Or like assuming a defendant is innocent until proven guilty.</p>
<p>Now, as the scientist performs their experiment, they compare their results with what the’d expect to see if the null-hypothesis were true. What they’re looking for, though, is evidence that the null-hypothesis is actually false.</p>
<p>An example might help.</p>
<p>Let’s say you want to test whether a coin is biased towards heads. Your hypothesis, referred to as the alternate hypothesis (or H₁), that you want to test is that it is biased. The null-hypothesis (H₀) is that it’s unbiased.</p>
<p>We already know from repeated tests that if you flip a fair coin 100 times, you’d expect it come up heads around 50 times (but it won’t always come up heads precisely 50 times). So if the scientist flips the coin 100 times and it comes up heads 55 times, it’s pretty likely to be a fair coin. But if it comes up heads 70 times, it starts to look fishy.</p>
<p>But how can they tell 70 heads is not just the result of chance? It’s certainly <em>possible</em> for a fair coin to come up heads 70 times. It’s just very unlikely. And the scientist can use <a href="http://stattrek.com/online-calculator/binomial.aspx">statistics</a> to determine how unlikely it is.</p>
<p>If they flip a fair coin 100 times, there’s a 13.6% chance that it’ll come up heads 55 or more times. That’s unlikely, but not enough to be confident the coin is biased.</p>
<p>But there’s only a 0.1% chance that it’ll come up heads 70 or more times. Now the coin is looking decidedly dodgy.</p>
<p>The probability of seeing this particular result is referred to as the “p-value”, expressed in decimal rather than percentage terms, so 13.6% is 0.136 and 0.01% chance is 0.0001.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=395&fit=crop&dpr=1 600w, https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=395&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=395&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=496&fit=crop&dpr=1 754w, https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=496&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/161100/original/image-20170316-20802-228asu.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=496&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">This is a ‘normal distribution’, showing the probability of getting a particular result. The further out you go on the ‘tails’, the less likely the result.</span>
<span class="attribution"><span class="source">Wikimedia</span></span>
</figcaption>
</figure>
<p>Typically, scientists consider a p-value of 0.05 to be a good indication you can reject the null-hypothesis (eg, that the coin is unbiased) and be more confident that your alternative hypothesis (that the coin is biased) is true.</p>
<p>This value of 0.05 is called the “significance level”. So if a result has a p-value that is above the significance level, then the result is considered “significant”.</p>
<p>It’s important to note that this refers to the technical sense of “statistical significance” rather than the more qualitative vernacular sense of “significant”, as in my “significant other” (although statisticians’ partners may differ in this interpretation).</p>
<p>This approach to science is also not without fault.</p>
<p>For one, if you set your significance level at 0.05, and you run the same experiment 20 times, then you’d expect one of those experiments to yield a false result, yet still clear the significance bar. So in a journal with 20 papers, you can also expect roughly one to be wrong.</p>
<p>This is one of the factors contributing to the so-called “<a href="https://en.wikipedia.org/wiki/Replication_crisis">replication crisis</a>” in science, particularly in medicine and <a href="https://www.psychologytoday.com/blog/the-nature-nurture-nietzsche-blog/201509/quick-guide-the-replication-crisis-in-psychology">psychology</a>.</p>
<h2>p-hacking</h2>
<p>One prime suspect in the replication crisis is the problem of “p-hacking”.</p>
<p>A good experiment will clearly define the null and the alternate hypothesis before handing out the drugs and placebos. But many experiments collect more than just one dimension of data. A trial for a headache drug might also keep an eye on side-effects, weight gain, mood, or any other variable the scientists can observe and measure.</p>
<p>And if one of these secondary factors shows a “significant” effect – like the group who took the headache drug also lost a lot of weight – it might be tempting to shift focus onto that effect. After all, you never know when you’ll come across the next <a href="https://www.drugs.com/slideshow/viagra-little-blue-pill-1043">Viagra</a>.</p>
<p>However, if you simply track 20 variables in a study, you’d expect one of them to pass the significance threshold. Simply picking that variable, and writing up the study as if that was the focus all along is dodgy science. </p>
<p>It’s why we sometimes hear stuff that’s too good to be true, like that <a href="http://imed.pub/ojs/index.php/iam/article/view/1087/728">chocolate can help you lose weight</a> (although that study turned out to be a <a href="https://theconversation.com/trolling-our-confirmation-bias-one-bite-and-were-easily-sucked-in-42621">cheeky attempt</a> to show how easy it is for a scientist to get away with blatant p-hacking).</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=349&fit=crop&dpr=1 600w, https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=349&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=349&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=438&fit=crop&dpr=1 754w, https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=438&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/161486/original/image-20170320-6133-adn580.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=438&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Some of the threats to reproducible science, including ‘hypothesising after the results are known’ (HARKing) and p-hacking.</span>
<span class="attribution"><a class="source" href="http://www.nature.com/articles/s41562-016-0021?utm_content=buffer7cf17&utm_medium=social&utm_source=twitter.com&utm_campaign=manny">Munafo et al, 2017</a></span>
</figcaption>
</figure>
<h2>Publishing</h2>
<p>Once scientists have conducted their experiment and found some interesting results, they move on to publishing them.</p>
<p>Science is somewhat unique in that the norm is towards full transparency, where scientists effectively give away their discoveries to the rest of the scientific community and society at large.</p>
<p>This is not only out of a magnanimous spirit, but because it also turns out to be a highly effective way of scrutinising scientific discoveries, and helping others to <a href="https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants">build upon them</a>.</p>
<p>The way this works is typically by publishing in a peer-review journal.</p>
<p>It starts with the scientist preparing their findings according to the accepted conventions, such as providing an abstract, which is an overview of their discovery, and outlining the method they used in detail, describing their raw results and only then providing their interpretation of those results. They also cite other relevant research – a precursor to hyperlinks.</p>
<p>They then send this “paper” to a scientific journal. Some journals are more desirable than others, i.e. they have a “<a href="http://www.sciencegateway.org/rank/index.html">high impact</a>”. The top tier, such as <a href="http://www.nature.com/nature/index.html">Nature</a>, <a href="http://www.sciencemag.org/">Science</a>, <a href="http://www.thelancet.com/">The Lancet</a> and <a href="http://www.pnas.org/">PNAS</a>, are popular, so they receive many high quality papers and accept only the best (or, if you’re a bit cynical, the most flashy). Other journals are highly specialist, and may be desirable because they’re held in high esteem by a very specific audience.</p>
<p>If the journal rejects the paper, the scientists move on to the next most desirable journal, and keep at it until it’s accepted or remains unpublished.</p>
<p>These journals employ a peer review process, where the paper is typically anonymised and sent out to a number of experts in the field. These experts then review the paper, looking for potential problems with the methods, inconsistencies in reporting or interpretation, and whether they’ve explained things clearly enough such that another lab could reproduce the results if they wanted to.</p>
<p>The paper might bounce back and forth between the peer reviewers and authors until it’s at a point where it’s ready to publish. This process can take as little as a few weeks, but in some cases it can take months or even years.</p>
<p>Journals don’t always get things right, though. Sometimes a paper will slip through with shoddy method or even downright <a href="http://www.the-scientist.com/?articles.list/tagNo/2642/tags/scientific-fraud/">fraud</a>. A useful site for keeping tabs on dodgy journals and researchers is <a href="http://retractionwatch.com/">Retraction Watch</a>.</p>
<h2>Open Access</h2>
<p>A new trend in scientific publishing is <a href="https://theconversation.com/au/topics/open-access-1060">Open Access</a>. While traditional journals don’t charge to accept papers, or pay scientists if they do publish their paper, they do charge fees (often exorbitant ones) to university libraries to subscribe to the journal.</p>
<p>What this means is a huge percentage of scientific research – often funded by taxpayers – is walled off so non-academics can’t access it.</p>
<p>The <a href="https://theconversation.com/au/topics/open-access-1060">Open Access</a> movement takes a different approach. Open Access journals release all their published research free of charge to readers, but they often recoup their costs by charging scientists to publish their work.</p>
<p>Many Open Access journals are <a href="https://www.plos.org/publications">well respected</a>, and are gaining in prestige in academia, but the business model also creates a moral hazard, and incentives for journals to publish any old claptrap in order to make a buck. This has led to an entire industry of <a href="https://en.wikipedia.org/wiki/Predatory_open_access_publishing">predatory journals</a>.</p>
<p>Librarian Jeffery Beall used to maintain a list of “potential, possible, or probable” predatory publishers, which was the go-to for checking if a journal is legit. However, in early 2017 Beall <a href="https://theconversation.com/who-will-keep-predatory-science-journals-at-bay-now-that-jeffrey-bealls-blog-is-gone-71613">took the list offline</a> for reasons yet to be made clear. The list is <a href="https://clinicallibrarian.wordpress.com/2017/01/23/bealls-list-of-predatory-publishers/">mirrored</a>, but every month that goes by makes it less and less reliable.</p>
<p>Many scientists also publish their research on pre-press servers, the most popular being <a href="https://arxiv.org/">arXiv</a> (pronounced “archive”). These are clearing houses for papers that haven’t yet been peer-reviewed or accepted by a journal. But they do offer scientists a chance to share their early results and get feedback and criticism before they finalise their paper to submit to a journal.</p>
<p>It’s often tempting to get the jump on the rest of the media by reporting on a paper published on a pre-press site, especially if it has an exciting finding. However, journalists should exercise caution, as these papers haven’t been through the peer-review process, so it’s harder to judge their quality. Some wild and hyperbolic claims also make it to pre-press outlets. So if a journalist is tempted by one, they should run it past a trusted expert first.</p>
<p>The next post in this series will deal with more practical considerations about about to pick a good science story, and how to cite sources properly.</p><img src="https://counter.theconversation.com/content/74547/count.gif" alt="The Conversation" width="1" height="1" />
This is the second part in a series on how we edit science, looking at hypothesis testing, the problem of p-hacking and how the peer review process works.Tim Dean, EditorLicensed as Creative Commons – attribution, no derivatives.