tag:theconversation.com,2011:/us/topics/replication-crisis-50174/articlesReplication crisis – The Conversation2023-10-30T03:31:01Ztag:theconversation.com,2011:article/2161772023-10-30T03:31:01Z2023-10-30T03:31:01ZTwo questions, hundreds of scientists, no easy answers: how small differences in data analysis make huge differences in results<figure><img src="https://images.theconversation.com/files/556512/original/file-20231030-25-sz3v30.jpg?ixlib=rb-1.1.0&rect=0%2C17%2C3872%2C2567&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">How do siblings affect the size of baby blue tits? It depends whom you ask.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/blue-tit-cyanisties-caeruleus-being-gaping-1700866117">Shutterstock</a></span></figcaption></figure><p>Over the past 20 years or so, there has been growing concern that <a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">many results published in scientific journals can’t be reproduced</a>. </p>
<p>Depending on the field of research, studies have found efforts to redo published studies lead to different results in between <a href="https://news.virginia.edu/content/after-10-years-many-labs-comes-end-its-success-replicable">23%</a> and <a href="https://www.nature.com/articles/483531a">89%</a> of cases.</p>
<p>To understand how different researchers might arrive at different results, we asked hundreds of ecologists and evolutionary biologists to answer two questions by analysing given sets of data. They arrived at a huge range of answers.</p>
<p>Our study has been accepted by BMC Biology as a stage 1 <a href="https://www.cos.io/initiatives/registered-reports">registered report</a> and is <a href="https://ecoevorxiv.org/repository/view/6000/">currently available as a preprint</a> ahead of peer review for stage 2.</p>
<h2>Why is reproducibility a problem?</h2>
<p>The <a href="https://theconversation.com/putting-psychological-research-to-the-test-with-the-reproducibility-project-7052">causes of problems with reproducibility</a> are common across science. They include an over-reliance on simplistic measures of
“statistical significance” rather than nuanced evaluations, the fact journals prefer to publish “exciting” findings, and <a href="https://theconversation.com/our-survey-found-questionable-research-practices-by-ecologists-and-biologists-heres-what-that-means-94421">questionable research practices</a> that make articles more exciting at the expense of transparency and increase the rate of false results in the literature.</p>
<p>Much of the research on reproducibility and ways it can be improved (such as <a href="https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198">“open science” initiatives</a>) has been slow to spread between different fields of science. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/our-survey-found-questionable-research-practices-by-ecologists-and-biologists-heres-what-that-means-94421">Our survey found 'questionable research practices' by ecologists and biologists – here's what that means</a>
</strong>
</em>
</p>
<hr>
<p>Interest in these ideas has been <a href="https://www.science.org/content/article/psychology-s-replication-crisis-inspires-ecologists-push-more-reliable-research">growing among ecologists</a>, but so far there has been little research evaluating replicability in ecology. One reason for this is the difficulty of disentangling environmental differences from the influence of researchers’ choices.</p>
<p>One way to get at the replicability of ecological research, separate from environmental effects, is to focus on what happens after the data is collected.</p>
<h2>Birds and siblings, grass and seedlings</h2>
<p>We were inspired by <a href="https://www.nature.com/articles/526189a">work led by Raphael Silberzahn</a> which asked social scientists to analyse a dataset to determine whether soccer players’ skin tone predicted the number of red cards they received. The study found a wide range of results.</p>
<p>We emulated this approach in ecology and evolutionary biology with an open call to help us answer two research questions:</p>
<ul>
<li><p>“To what extent is the growth of nestling blue tits (<em>Cyanistes caeruleus</em>) influenced by competition with siblings?” </p></li>
<li><p>“How does grass cover influence <em>Eucalyptus</em> spp. seedling recruitment?” (“<em>Eucalyptus</em> spp. seedling recruitment” means how many seedlings of trees from the genus <em>Eucalyptus</em> there are.)</p></li>
</ul>
<figure class="align-center ">
<img alt="A photo of eucalyptus seedlings outdoors" src="https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/556514/original/file-20231030-17-mh6uxm.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Researchers disagreed over whether grass cover encourages or discourages Eucalyptus seedlings.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/close-growing-eucalyptus-seedling-650020558">Shutterstock</a></span>
</figcaption>
</figure>
<p>Two hundred and forty-six ecologists and evolutionary biologists answered our call. Some worked alone and some in teams, producing 137 written descriptions of their overall answer to the research questions (alongside numeric results). These answers varied substantially for both datasets.</p>
<p>Looking at the effect of grass cover on the number of <em>Eucalyptus</em> seedlings, we had 63 responses. Eighteen described a negative effect (more grass means fewer seedlings), 31 described no effect, six teams described a positive effect (more grass means more seedlings), and eight described a mixed effect (some analyses found positive effects and some found negative effects). </p>
<p>For the effect of sibling competition on blue tit growth, we had 74 responses. Sixty-four teams described a negative effect (more competition means slower growth, though only 37 of these teams thought this negative effect was conclusive), five described no effect, and five described a mixed effect.</p>
<h2>What the results mean</h2>
<p>Perhaps unsurprisingly, we and our coauthors had a range of views on how these results should be interpreted.</p>
<p>We have asked three of our coauthors to comment on what struck them most.</p>
<p>Peter Vesk, who was the source of the <em>Eucalyptus</em> data, said: </p>
<blockquote>
<p>Looking at the mean of all the analyses, it makes sense. Grass has essentially a negligible effect on [the number of] eucalypt tree seedlings, compared to the distance from the nearest mother tree. But the range of estimated effects is gobsmacking. It fits with my own experience that lots of small differences in the analysis workflow can add to large variation [in results].</p>
</blockquote>
<p>Simon Griffith collected the blue tit data more than 20 years ago, and it was not previously analysed due to the complexity of decisions about the right analytical pathway. He said: </p>
<blockquote>
<p>This study demonstrates that there isn’t one answer from any set of data. There are a wide range of different outcomes and understanding the underlying biology needs to account for that diversity.</p>
</blockquote>
<p>Meta-researcher Fiona Fidler, who studies research itself, said: </p>
<blockquote>
<p>The point of these studies isn’t to scare people or to create a crisis. It is to help build our understanding of heterogeneity and what it means for the practice of science. Through metaresearch projects like this we can develop better intuitions about uncertainty and make better calibrated conclusions from our research.</p>
</blockquote>
<h2>What should we do about it?</h2>
<p>In our view, the results suggest three courses of action for researchers, publishers, funders and the broader science community.</p>
<p>First, we should avoid treating published research as fact. A single scientific article is just one piece of evidence, existing in a broader context of limitations and biases. </p>
<p>The push for “novel” science means studying something that has already been investigated is discouraged, and consequently we inflate the value of individual studies. We need to take a step back and consider each article in context, rather than treating them as the final word on the matter.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198">The science 'reproducibility crisis' – and what can be done about it</a>
</strong>
</em>
</p>
<hr>
<p>Second, we should conduct more analyses per article and report all of them. If research depends on what analytic choices are made, it makes sense to present multiple analyses to build a fuller picture of the result.</p>
<p>And third, each study should include a description of how the results depend on data analysis decision. Research publications tend to focus on discussing the ecological implications of their findings, but they should also talk about how different analysis choices influenced the results, and what that means for interpreting the findings.</p><img src="https://counter.theconversation.com/content/216177/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Elliot Gould receives funding from an Australian Government Research Training Program Scholarship.</span></em></p><p class="fine-print"><em><span>Hannah Fraser and Timothy H. Parker do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>246 scientists looked at the same data sets and drew very different conclusions.Hannah Fraser, Postdoctoral Researcher , The University of MelbourneElliot Gould, PhD student, School of Biosciences, The University of MelbourneTimothy H. Parker, Professor of Biology and Environmental Studies, Whitman CollegeLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1886122022-08-16T16:08:18Z2022-08-16T16:08:18ZDoes entitlement make you more likely to cheat? New research challenges popular psychology idea<figure><img src="https://images.theconversation.com/files/478897/original/file-20220812-3923-gpzhuu.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C6016%2C4016&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Do you cheat at dice games?</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/dice-430194097">beeboys/Shutterstock</a></span></figcaption></figure><p>Why do people cheat? <a href="https://www.pnas.org/doi/full/10.1073/pnas.1515102113">An intriguing study</a> by two Israeli researchers in 2016 put forward a possible reason that has since become well established in the scientific literature and popular media.</p>
<p>The researchers reported a series of experiments apparently showing that people told they have won a skill-based competition, such as a visual task, subsequently cheat more than others in games of chance, such as dice games. The proposed explanation was that winners experienced a sense of entitlement that induced them to cheat.</p>
<p>The paper has been highly cited by other researchers. One scientific comment paper even pointed out its significance <a href="https://www.frontiersin.org/articles/10.3389/fnins.2017.00417/full">in the light of tax evasion</a> costing governments US$3.1 trillion (£2.6 trillion) annually. </p>
<p>But does the finding hold up to scientific scrutiny? We decided to replicate the study and investigate more closely the reasons why people do or don’t cheat.</p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/FOeoGpgX8AE?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
</figure>
<p>Our new study, <a href="https://royalsocietypublishing.org/doi/10.1098/rsos.202197">published in Royal Society Open Science</a>, failed twice to replicate the original finding. We found that the original experiments were “statistically underpowered”, meaning they used far too few experimental participants (43 in their main experiment) to sustain the conclusions that were drawn. </p>
<p>There were also problems of experimental design and methodology, notably a failure to randomly decide which participants were winners, losers, or part of a control group that weren’t told how they had done in the skill-based competition.</p>
<p>We began by replicating the original research as closely as possible, but in a large-scale experiment (252 participants) to achieve adequate statistical power. We also assigned participants randomly to conditions. </p>
<p>To assign winners and losers, we used the perceptual judgement test used in the original experiment. The test involves the difficult task of estimating which of several different symbols is the most numerous in briefly displayed slides similar to the one shown below.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Faces shown in the perception test." src="https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=336&fit=crop&dpr=1 600w, https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=336&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=336&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=423&fit=crop&dpr=1 754w, https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=423&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/478894/original/file-20220812-1300-whppmy.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=423&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Which face do you see most of?</span>
<span class="attribution"><span class="license">Author provided</span></span>
</figcaption>
</figure>
<p>We put the participants in pairs and told them whether they had a better or worse score than their partner in the skill task. They were then put in new pairs and played a game of chance. The pairs then played a game of chance, also identical to the game in the original research. This involved rolling two dice under an inverted cup and then peeking through a spyhole in its base to see the result. </p>
<figure class="align-center ">
<img alt="Image of a cup and two dice." src="https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=477&fit=crop&dpr=1 600w, https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=477&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=477&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=600&fit=crop&dpr=1 754w, https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=600&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/479120/original/file-20220815-485-7ffri6.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=600&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Dice game.</span>
<span class="attribution"><span class="license">Author provided</span></span>
</figcaption>
</figure>
<p>The players were told to help themselves to money from an envelope provided depending on what numbers the dice showed – 25 pence for each dice spot. While it was impossible to tell who in particular cheated, collecting much significantly more than the average amount was evidence of cheating.</p>
<p>We also assigned one-third of the participants to a control group. They were not told whether or not they had beaten their partner in the visual task before playing the the dice game.</p>
<p>Comparing the results to what we’d expect to happen by chance, a small but statistically significant amount of cheating seemed to have occurred, as in the original Israeli experiment. But our results showed no evidence that winning (or losing) had any statistically significant effect whatsoever on cheating, as can be seen in the graph below, where the dotted line shows the value expected by chance, without cheating.</p>
<figure class="align-center ">
<img alt="Graph showing the amount of money taken by winners, losers and control participants." src="https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=480&fit=crop&dpr=1 600w, https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=480&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=480&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=604&fit=crop&dpr=1 754w, https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=604&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/478893/original/file-20220812-14-jca53y.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=604&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Winners weren’t significantly more likely to cheat.</span>
<span class="attribution"><span class="license">Author provided</span></span>
</figcaption>
</figure>
<p>We also ran an even larger online experiment (275 participants) in which we assigned participants randomly to be winners, losers or control participants using the same perceptual test as before. </p>
<p>In this experiment, each participant tossed a coin ten times and claimed rewards (Amazon gift vouchers) depending on how many heads they tossed. The results were almost identical to our first experiment: we found a similar level of cheating and no evidence of any effect of winning or losing on subsequent cheating. </p>
<p>We used standardised psychometric tests designed to measure differences between people that might influence cheating, including a sense of entitlement, self-confidence, belief in personal luck, and a few other factors. But only one, turned out to be statistically significant in all treatment conditions. </p>
<p>Participants who dislike inequality cheated less than others. This is presumably because they had a stronger sense of fairness and considered cheating unfair. A sense of entitlement, on the other hand, was not significantly associated with cheating in any condition. </p>
<p>Ultimately, what makes some people cheat more than others is not fully understood. But our research suggests people’s feelings about inequality is one part of the explanation. There are also momentary circumstantial factors that encourage some people, but not others, to cheat.</p>
<h2>Psychology in crisis</h2>
<p>The original Israeli experiment does not replicate, and it should be viewed in the context of what’s known as the <a href="https://www.youtube.com/watch?v=v778svukrtU">replication or reproducibility crisis</a> in psychology. This refers to the fact that many recorded scientific findings <a href="https://journals.sagepub.com/doi/10.1177/1745691612465253">are impossible to reproduce</a> when experiments are repeated. </p>
<p>One of the principal drivers of the crisis is inadequate statistical power, meaning the use of sample sizes that are too small to yield trustworthy results. Our two experiments had extremely high (95%) statistical power, as required by the publisher of our registered report. </p>
<p>Another driver of the crisis is “publication bias”, which is when articles with a positive result are more likely to be published than those with a negative one.
Factors such as “p-hacking” (performing multiple different statistical tests on data until one of them turns out to be significant) and harking (creating a hypothesis after results are known) are also to blame. </p>
<p>Registered reports, in which investigators submit research proposals, including hypotheses and planned statistical tests before the research is undertaken, can ultimately help eliminate most of the drivers of the replication crisis. Such an approach will no doubt one day help us uncover other reasons why people cheat.</p><img src="https://counter.theconversation.com/content/188612/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>People who have a strong sense of fairness are less likely to cheat.Andrew M Colman, Professor of Psychology, University of LeicesterMarta Mangiarulo, Teaching Fellow, Research Assistant, School of Psychology, University of LeicesterLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1805412022-05-12T18:44:27Z2022-05-12T18:44:27ZThe idea that power poses boost your confidence fell from favor – but a new review of the research calls for a second look<figure><img src="https://images.theconversation.com/files/462476/original/file-20220511-25-1kzokh.jpg?ixlib=rb-1.1.0&rect=745%2C8%2C5245%2C3727&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">After great popularity, the idea of power poses came under fire.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/woman-in-superhero-costume-royalty-free-image/1140379193">Choreograph/iStock via Getty Images Plus</a></span></figcaption></figure><p>If you stand like Wonder Woman or Superman, will you feel stronger? Will you actually be stronger?</p>
<p>Psychology researchers have investigated these questions for decades. After all, mind and body are intertwined. <a href="https://doi.org/10.1007/BF00992249">How you stand or sit can give you feedback</a> on how you feel, and your feelings are often revealed by the way you hold yourself. </p>
<p>One influential study published in 2010 suggested that power poses – body positions like a wide stance with your hands on your hips while standing, or clasping your hands behind your head and putting your feet on a desk while sitting – <a href="https://doi.org/10.1177/0956797610383437">increased levels of the male sex hormone testosterone</a> and decreased levels of cortisol, the main stress hormone. High levels of testosterone and low levels of cortisol are linked to fearlessness, <a href="https://doi.org/10.1016/j.yhbeh.2010.08.020">risk-taking</a> and insensitivity to punishment. From there, scientists assumed that <a href="https://doi.org/10.1177/0956797614566855">power posing could affect how people felt</a>, how they acted and <a href="https://doi.org/10.3389/fpsyg.2016.01463">how others perceived them</a>.</p>
<p>These findings drew enormous attention outside of the lab. <a href="https://www.littlebrownspark.com/titles/amy-cuddy/presence/9780316256551/">Power posing was advertised</a> as a way of improving one’s life, and the idea took off in popular culture. Intentionally adopting the stance of a powerful person could apparently give you the confidence and the appearance of a powerful person.</p>
<p>But in the following years, some researchers could not replicate the original findings when they tried to rerun the experiments. The lead author of the original study <a href="https://faculty.haas.berkeley.edu/dana_carney/pdf_my%20position%20on%20power%20poses.pdf">admitted to mistakes and distanced herself from it</a>. Since then, there’s been a heated debate about whether engaging in power poses really does anything at all.</p>
<p>In an effort to figure out which power pose findings hold up and which do not, <a href="https://psycnet.apa.org/record/2022-61115-003?doi=1">we conducted a meta-analytic review</a> – that is, we combined data from all available research on the topic. Based on dozens of studies, we suggest that there is something to the idea of power poses, even if the research was overhyped in the past.</p>
<h2>Pulling together findings from 88 studies</h2>
<p>We focused on two types of body positions.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="girl seated on couch gets lecture from a woman" src="https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=816&fit=crop&dpr=1 600w, https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=816&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=816&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=1026&fit=crop&dpr=1 754w, https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=1026&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/462478/original/file-20220511-6370-675uv2.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=1026&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">A low-power pose may look similar to a child receiving a reprimand.</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/mother-lecturing-daughter-in-living-room-royalty-free-image/107697790">JGI/Jamie Grill/Tetra images via Getty Images</a></span>
</figcaption>
</figure>
<p>The first type included power poses. Examples of high-power poses would be standing or sitting in a very expansive way, taking up a lot of space. A low-power pose would be crossing your legs and folding your arms while standing, or bowing your head and putting your hands on your lap while seated.</p>
<p>The second type included upright postures, like standing erect or sitting up straight in a chair versus bowing your head and slumping. <a href="https://doi.org/10.1111/spc3.12559">Theoretical</a> and <a href="https://doi.org/10.1037/pspi0000181">empirical</a> research have suggested that power poses are nonverbal expressions of dominance, whereas upright postures are displays of prestige. </p>
<p>Following open-science standards, we <a href="https://doi.org/10.17605/OSF.IO/CX2Q3">preregistered our protocol</a> with the Open Science Framework before conducting the analysis. This step is meant to increase transparency. By stating the game plan upfront, you can’t fiddle around with the data to try to find something significant to report.</p>
<p>Then we combed through 12 scientific databases with search terms including “body position” and “power pose.” This hunt turned up over 24,000 potentially relevant studies. We included just the ones that randomly assigned participants to different groups. Only this <a href="https://itfeature.com/design-of-experiment-doe/basic-principles-of-experimental-design">experimental design</a> allows researchers to make inferences about the cause of any effects they identify.</p>
<p>Often if a study doesn’t find a link between the the factors it was investigating, the research doesn’t end up getting published. Because of this phenomenon, called <a href="https://doi.org/10.1371/journal.pone.0215052">publication bias</a>, we sent requests for unpublished data to researchers from six different scientific societies. We also contacted all 21 researchers who had authored at least two articles on body positions to inquire whether they had any unpublished studies. Over one-fourth of the effects we analyzed came from unpublished studies.</p>
<p>In the end, our analysis of high- versus low-power poses and upright versus slumped poses was based on 313 effects from 88 studies that included 9,799 participants. </p>
<h2>What held up and what didn’t</h2>
<p>Our review examined three types of potential effects power poses and upright positions could have.</p>
<p>First there were self-reported effects, such as feeling powerful, confident and positive. These kinds of effects were statistically significant and robust, meaning they were seen again and again across many studies. People told researchers they felt stronger when they engaged in power poses and upright postures.</p>
<p>Then there were behavioral effects, such as how long participants would stick with a task, whether they exhibited antisocial behavior, and how action-oriented they were. Researchers identified these effects in many studies as well, but the findings were less reliable and more subject to publication bias.</p>
<p>Finally there were physiological effects such as hormone levels, heart rate and skin conductance, which often stands in as a way to measure stress in psychology research studies. In our meta-analysis, these effects were not statistically significant across all the studies. It was in this area that the power pose research didn’t hold up. Simply taking expansive body positions does not influence hormones or other physiological indicators as previously believed.</p>
<p>We found these self-reported and behavioral effects in studies from both Western countries like the U.S., Germany and the U.K. that favor the individual and in Eastern countries like China, Japan and Malaysia that favor the collective. Age and gender did not make a difference with respect to the effects. Nor did it matter whether participants were college students or not. From the available data it is not clear, however, how long such effects last after someone moves out of a particular body position.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="woman seated at her desk smiles with her legs open and arms wide on arm rests" src="https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=399&fit=crop&dpr=1 600w, https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=399&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=399&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=502&fit=crop&dpr=1 754w, https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=502&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/462479/original/file-20220511-15-9qbhu7.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=502&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Taking up space can be an expression of dominance.</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/woman-posing-by-her-desk-at-home-office-royalty-free-image/499236621">Lucy Lambriex/The Image Bank via Getty Images</a></span>
</figcaption>
</figure>
<h2>What new experiments can explore</h2>
<p>Unfortunately, many experimental studies in our meta-analysis did not include a control group of participants who adopted a neutral body position. That means we can’t tell for sure whether it is high-power poses and upright postures making people feel more positive and powerful, whether it is the low-power and slumped postures making people feel less positive and powerful, or whether it is some combination of the two. Future studies could clarify that question by including control groups that hold neutral body positions for comparison.</p>
<p>Furthermore, most studies included participants from Western, educated, industrialized, rich and democratic societies – characterized as “WEIRD” by psychology researchers. Effects should also be tested in other populations.</p>
<p>To promote and facilitate further insights on the effects of body positions, we also created an <a href="https://metaanalyses.shinyapps.io/bodypositions/">app</a> allowing researchers to enter new data and download the most recent results. Continuing these investigations is important, because science is an ongoing process that usually does not provide definitive final answers. More evidence accumulates with each new study.</p><img src="https://counter.theconversation.com/content/180541/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>For a while it was all the rage to adopt Wonder Woman’s famous stance and other body positions that allegedly pumped up your confidence – until more studies of the phenomenon failed to find the connection.Astrid Schütz, Professor of Psychology, University of BambergBrad Bushman, Professor of Communication and Rinehart Chair of Mass Communication, The Ohio State UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1736342022-02-01T13:14:21Z2022-02-01T13:14:21ZDid male and female dinosaurs differ? A new statistical technique is helping answer the question<figure><img src="https://images.theconversation.com/files/443225/original/file-20220128-23-12zgv3p.jpg?ixlib=rb-1.1.0&rect=2%2C14%2C1950%2C1159&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">How can researchers tell if male and female dinosaurs, like the stegosaur, were different?</span> <span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Journal.pone.0138352.g001A.jpg#/media/File:Journal.pone.0138352.g001A.jpg">Susannah Maidment et al. & Natural History Museum, London</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span></figcaption></figure><p>In most animal species, <a href="https://doi.org/10.2307/2407393">males and females differ</a>. This is true for people and other mammals, as well as many species of birds, fish and reptiles. But what about dinosaurs? In 2015, I proposed that variation found in the iconic back plates of stegosaur dinosaurs was <a href="https://doi.org/10.1371/journal.pone.0123503">due to sex differences</a>.</p>
<p>I was surprised by how strongly some of my colleagues <a href="https://doi.org/10.1017/pab.2016.51">disagreed</a>, arguing that differences between sexes, called sexual dimorphism, <a href="https://doi.org/10.2307/2407393">did not exist in dinosaurs</a>.</p>
<p><a href="https://scholar.google.com/citations?user=umU9KBMAAAAJ&hl=en&oi=ao">I am a paleontologist</a>, and the debate sparked by my 2015 paper has made me reconsider how researchers studying ancient animals use statistics. </p>
<p>The limited fossil record makes it hard to declare if a dinosaur was sexually dimorphic. But I and some others in my field are beginning to <a href="https://doi.org/10.1038/d41586-019-00857-9">shift away from traditional black-or-white statistical thinking</a> that relies on p-values and statistical significance to define a true finding. Instead of only looking for yes or no answers, we are beginning to consider the estimated magnitude of sexual variation in a species, the degree of uncertainty in that estimate and how these measures compare to other species. This approach offers a more nuanced analysis to challenging questions in paleontology as well as many other fields of science.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="A very colorful duck standing next to a drab brown duck." src="https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=450&fit=crop&dpr=1 600w, https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=450&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=450&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=566&fit=crop&dpr=1 754w, https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=566&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/443076/original/file-20220127-9640-1ercxvu.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=566&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">In many species, like these mandarin ducks, males (left) and females (right) look very different.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Pair_of_mandarin_ducks.jpg">Francis C. Franklin via WikimediaCommons</a>, <a class="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA</a></span>
</figcaption>
</figure>
<h2>Differences between males and females</h2>
<p><a href="http://dx.doi.org/10.1007/978-3-319-47829-6_433-1">Sexual dimorphism</a> is when males and females of a certain species differ on average in a particular trait – not including their reproductive anatomy. Classic examples are how male deer have antlers and male peacocks have flashy tail feathers, while the females lack these traits.</p>
<p>Dimorphism can also be subtle and unflashy. Often the difference is one of degree, like differences in the average body size between males and females – as in <a href="https://doi.org/10.1007/s12110-012-9130-3">gorillas</a>. In these modest cases, researchers use statistics to determine whether a trait differs on average between males and females.</p>
<h2>The dinosaur dilemma</h2>
<p>Studying sexual dimorphism in extinct animals is fraught with uncertainty. If you and I independently dig up similar fossils of the same species, they are inevitably going to be slightly different. These differences could be due to sex, but they could also be driven by age – <a href="https://www.worldcat.org/title/avian-anatomy-integument/oclc/603445440&referer=brief_results">young birds are fuzzy, adult birds are sleek</a>. They could also be due to genetics unrelated to sex, like eye color in humans.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Two drawings of dinosaurs showing different shaped horns and frills." src="https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=520&fit=crop&dpr=1 600w, https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=520&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=520&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=653&fit=crop&dpr=1 754w, https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=653&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/437461/original/file-20211214-15-1gmw3ot.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=653&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">It’s possible that variation among individual dinosaurs of the same species could be due to sexual dimorphism, but there are rarely good enough samples to assert so using traditional statistics.</span>
<span class="attribution"><span class="source">James Ormiston</span>, <a class="license" href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></span>
</figcaption>
</figure>
<p>If paleontologists had thousands of fossils to study of every species, the many sources of biological variation wouldn’t matter as much. Unfortunately, the <a href="https://doi.org/10.1002/bies.201700167">ravages of time</a> have left the fossil record painfully incomplete, often with less than a dozen good specimens for large, extinct vertebrate species. Additionally, there is currently no way to identify the sex of an individual fossil except in rare cases where obvious clues exist, like <a href="https://doi.org/10.1126/science.1110578">eggs preserved within the body cavity</a>. </p>
<p>So where does all this leave the debate on whether male and female dinosaurs had differences within traits? On the one hand, birds – which are direct descendants of dinosaurs – <a href="https://doi.org/10.1098/rspb.1998.0308">commonly show sexual dimorphism</a>. So do <a href="https://doi.org/10.18475/cjos.v45i1.a12">crocodilians</a>, dinosaurs’ next closest living relatives. Evolutionary theory also predicts that, since dinosaurs reproduced with sperm and egg, there would be a <a href="https://doi.org/10.1016/j.tree.2011.12.006">benefit to sexual dimorphism</a>.</p>
<p>These things all suggest that dinosaurs likely were sexually dimorphic. But in science you need to be quantitative. The challenge is that there is little in the way of <a href="https://doi.org/10.1017/pab.2016.51">statistically significant</a> analyses of the fossil record to support dimorphism. </p>
<h2>Statistical shifts</h2>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="A line graph showing two peaks." src="https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=315&fit=crop&dpr=1 600w, https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=315&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=315&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=396&fit=crop&dpr=1 754w, https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=396&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/443057/original/file-20220127-6424-dz34sy.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=396&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Very large sex differences can create a bimodal distribution that looks like two distinct groupings of a certain measurement.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Bimodal.png">Maksim via WikimediaCommons</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>There are a couple of ways paleontologists could test for sexual dimorphism. They could look to see if there are statistically significant differences between fossils from presumed males and females, but there are very few specimens where researchers <a href="https://doi.org/10.1073/pnas.0708903105">know the sex</a>. Another method is to see whether there are two distinct groupings of a trait, called a bimodal distribution, which could suggest a difference between males and females.</p>
<p>To tell whether a perceived difference between two groups is true, scientists have traditionally used a tool called the p-value. P-values quantify the probability of a result being due to random chance. If a p-value is low enough, the result is deemed “statistically significant” and considered unlikely to have happened by chance.</p>
<p>But p-values can be heavily influenced by sample size and the design of the study, in addition to the actual degree of sexual dimorphism. Because of the very small sample size of fossils, relying on this statistical technique makes it exceedingly difficult to categorically proclaim what dinosaur species were dimorphic. </p>
<p>The weakness of the black-or-white approach that focuses solely on whether a result is statistically significant has led to hundreds of scientists <a href="https://doi.org/10.1038/d41586-019-00857-9">calling to abandon significance testing with p-values</a> in favor of something called <a href="https://doi.org/10.1111/j.1469-185X.2007.00027.x">effect size statistics</a>. Using this approach, researchers would simply report the measured difference between two groups and the uncertainty in that measurement.</p>
<h2>Effect size statistics</h2>
<p>I have begun to apply effect size statistics in <a href="https://doi.org/10.1093/biolinnean/blaa105">my research on dinosaurs</a>. My colleagues and I compared sexual dimorphism in body size between three different dinosaurs: the duck-billed <em>Maiasaura</em>, <em>Tyrannosaurus rex</em> and <em>Psittacosaurus</em>, a small relative of <em>Triceratops</em>. None of these species would be expected to show statistically significant size differences between males and females according to p-values. But that approach does not capture the nature of the variation within these species. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="A cast of a duck billed dinosaur fossil skeleton." src="https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/443567/original/file-20220131-15-3xspkd.jpeg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Using effect size statistics, researchers were able to determine that the duck-billed dinosaur <em>Maiasaura</em> showed a larger amount of dimorphism with the least uncertainty in that estimate compared to other dinosaurs.</span>
<span class="attribution"><a class="source" href="https://en.wikipedia.org/wiki/Maiasaura#/media/File:Maiasaura_peeblesorum_cast_-_University_of_California_Museum_of_Paleontology_-_Berkeley,_CA_-_DSC04688.JPG">Daderot via WikimediaCommons</a></span>
</figcaption>
</figure>
<p>When we instead used effect size statistics, we were able to estimate that male and female <em>Maiasaura</em> demonstrate a greater difference in body mass compared to the other two species and that we had a higher confidence in this estimate as well. A few of the characteristics within the data helped reduce the uncertainty. First, we had a large number of <em>Maiasaura</em> fossils, from individuals of various ages. These bones very nicely fit with trajectories of how size changes as an individual grows from juvenile to adult, so we could control for differences due to age and instead focus on differences due to sex.</p>
<p>Additionally, the <em>Maiasaura</em> fossils all come from a <a href="https://doi.org/10.1017/pab.2015.19">single bone bed</a> of individuals that died in the same place at the same time. This means that variation between individuals is likely not due to them being different species from different regions or time periods. </p>
<p>If my colleagues and I had approached the problem expecting a yes or no answer on whether males and females differed in size, we would have completely missed all of these intricacies. Effect size statistics allow researchers to produce much more nuanced and, I think, informative results. It is almost as much a difference in the philosophical approach to science as it is a mathematical one.</p>
<p>Studying dinosaur dimorphism is not the only place p-values create issues. Many fields of science, including <a href="https://theconversation.com/the-replication-crisis-is-good-for-science-103736">medicine and psychology</a>, are having similar <a href="https://doi.org/10.1080/00031305.2018.1543137">debates about issues in statistics</a> and a worrying problem of <a href="https://doi.org/10.1371/journal.pmed.0020124">unrepeatable studies</a>.</p>
<p>Embracing uncertainty in data – rather than looking for black-or-white answers to questions like whether male and female dinosaurs were sexually dimorphic – can help elucidate dinosaur biology. But this shift in thinking may be felt far and wide across the sciences. A careful consideration of problems within statistics could have deep impacts across many fields.</p>
<p>[<em>Understand new developments in science, health and technology, each week.</em> <a href="https://memberservices.theconversation.com/newsletters/?nl=science&source=inline-science-understand">Subscribe to The Conversation’s science newsletter</a>.]</p><img src="https://counter.theconversation.com/content/173634/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Evan Thomas Saitta does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>The lack of large numbers of fossils makes it hard to study sexual dimorphism in dinosaurs. But a new statistical approach offers insight into this question and others across science.Evan Thomas Saitta, Postdoctoral Scholar in Paleontology, University of ChicagoLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1630002021-07-12T01:14:09Z2021-07-12T01:14:09ZStudying social media can give us insight into human behaviour. It can also give us nonsense<figure><img src="https://images.theconversation.com/files/409399/original/file-20210702-15-1muwihy.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C3875%2C2585&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">Shutterstock</span></span></figcaption></figure><p>Since the early days of social media, there has been <a href="https://www.nature.com/articles/d41586-020-01747-1">excitement</a> about how data traces left behind by users can be exploited for the study of human behaviour. Nowadays, reseachers who were once restricted to surveys or experiments in laboratory settings have access to huge amounts of “real-world” data from social media.</p>
<p>The research opportunities enabled by social media data are undeniable. However, researchers often analyse this data with tools that were not designed to manage the kind of large, noisy observational sets of data you find on social media.</p>
<p>We explored problems that researchers might encounter due to this mismatch between data and methods. </p>
<p>What we <a href="https://www.nature.com/articles/s41562-021-01133-5">found</a> is that the methods and statistics commonly used to provide evidence for seemingly significant scientific findings can also seem to support nonsensical claims. </p>
<h2>Absurd science</h2>
<p>The motivation for our paper comes from a series of research studies that deliberately present absurd scientific results. </p>
<p><a href="https://www.psychology.mcmaster.ca/bennett/psy710/readings/BennettDeadSalmon.pdf">One brain imaging study</a> appeared to show the neural activity of a dead salmon tasked with identifying emotions in photos. An <a href="https://www.bmj.com/content/337/bmj.a2533.abstract">analysis of longitudinal statistics from public health records</a> suggested that acne, height, and headaches are contagious. And an <a href="https://link.springer.com/article/10.3758/PBR.17.6.923">analysis of human decision-making</a> seemingly indicated people can accurately judge the population size of different cities by ranking them in alphabetical order.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/one-reason-so-many-scientific-studies-may-be-wrong-66384">One reason so many scientific studies may be wrong</a>
</strong>
</em>
</p>
<hr>
<p>Why would a researcher go out of their way to explore such ridiculous ideas? The value of these studies is not in presenting a new substantive finding. No serious researcher would argue, for example, that a dead salmon has a perspective on emotions in photos.</p>
<p>Rather, the nonsensical results highlight problems with the methods used to achieve them. Our research explores whether the same problems can afflict studies that use data from social media. And we discovered that indeed they do.</p>
<h2>Positive and negative results</h2>
<p>When a researcher seeks to address a research question, the method they use should be able to do two things: </p>
<ul>
<li><p>reveal an effect, when there is indeed a meaningful effect</p></li>
<li><p>show no effect, when there is no meaningful effect. </p></li>
</ul>
<p>For example, imagine you have chronic back pain and you take a medical test to find its cause. The test identifies a misaligned disc in your spine. This finding might be important and inform a treatment plan. </p>
<p>However, if you then discover the same test identifies this misaligned disc in a large proportion of the population who do not have chronic back pain, the finding becomes far less informative for you. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=338&fit=crop&dpr=1 600w, https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=338&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=338&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=424&fit=crop&dpr=1 754w, https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=424&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/410707/original/file-20210712-70646-efbzq4.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=424&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Like a spinal test that can’t tell the difference between people with back pain and people without, much social media research isn’t using the right tools for the job.</span>
<span class="attribution"><span class="source">Shutterstock</span></span>
</figcaption>
</figure>
<p>The fact the test fails to identify a relevant, distinguishing feature of negative cases (no back pain) from positive cases (back pain) does not mean the misaligned disc in your spine is non-existent. This part of the finding is as “real” as any finding. Yet the failure means the result is not useful: “evidence” that is as likely to be found when there is a meaningful effect (in this case, back pain) as when there is none is simply not diagnostic, and, as result, such evidence is uninformative.</p>
<h2>XYZ contagion</h2>
<p>Using the same rationale, we evaluated commonly used methods for analysing social media data — called “null hypothesis significance testing” and “correlational statistics” — by asking an absurd research question. </p>
<p>Past and current studies have tried to identify what factors influence Twitter users’ decisions to retweet other tweets. This is interesting both as a window into human thought and because resharing posts is a key mechanism by which messages are amplified or spread on social media.</p>
<p>So we decided to analyse Twitter data using the above standard methods to see whether a nonsensical effect we call “XYZ contagion” influences retweets. Specifically, we asked </p>
<blockquote>
<p>Does the number of Xs, Ys, and Zs in a tweet increase the probability of it being spread?</p>
</blockquote>
<p>Upon analysing six datasets containing hundreds of thousands of tweets, the “answer” we found was yes. For example, in a dataset of 172,697 tweets about COVID-19, the presence of an X, Y, or Z in a tweet appeared to increase the message’s reach by a factor of 8%. </p>
<p>Needless to say, we do not believe the presence of Xs, Ys, and Zs is a central factor in whether people choose to retweet a message on Twitter. </p>
<p>However, like the medical test for diagnosing back pain, our finding shows that sometimes, methods for social media data analysis can “reveal” effects where there should be none. This raises questions about how meaningful and informative results obtained by applying current social science methods to social media data really are.</p>
<p>As researchers continue to analyse social media data and identify factors that shape the evolution of public opinion, hijack our attention, or otherwise explain our behaviour, we should think critically about the methods underlying such findings and reconsider what we can learn from them.</p>
<h2>What is a ‘meaningful’ finding?</h2>
<p>The issues raised in our paper are not new, and there are indeed many research practices that have been developed to ensure results are meaningful and robust. </p>
<p>For example, researchers are encouraged to pre-register their hypotheses and analysis plans before starting a study to prevent a kind of data cherry-picking called <a href="https://www.wired.com/story/were-all-p-hacking-now/">“p-hacking”</a>. Another helpful practice is to check whether results are stable after removing outliers and controlling for <a href="https://journals.sagepub.com/doi/10.1177/1745691616658637">covariates</a>. Also important are <a href="https://www.sciencedirect.com/science/article/abs/pii/S0010945215000155?via%3Dihub">replication studies</a>, which assess whether the results obtained in an experiment can be found again when the experiment is repeated under similar conditions.</p>
<p>These practices are important, but they alone are not sufficient to deal with the problem we identify. While developing standardised research practices is needed, the research community must first think critically about what makes a finding in social media data meaningful.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/predicting-research-results-can-mean-better-science-and-better-advice-125568">Predicting research results can mean better science and better advice</a>
</strong>
</em>
</p>
<hr>
<img src="https://counter.theconversation.com/content/163000/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Ulrike Hahn has received funding from the Economic and Social Research Council, NESTA, The Australian Research Council, IARPA, the Leverhulme Trust, the Nuffield Foundation, the Alexander von Humboldt Foundation, and the European Research Council. </span></em></p><p class="fine-print"><em><span>Jason Burton and Nicole Cruz do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Researchers found the letters X, Y, and Z make tweets more shareable. The nonsensical result shows how easily statistics can be misused.Jason Burton, PhD researcher, Birkbeck, University of LondonNicole Cruz, Postdoctoral Research Associate, UNSW SydneyUlrike Hahn, Professor of Psychology, Birkbeck, University of LondonLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1554812021-03-12T06:33:31Z2021-03-12T06:33:31ZBehind a lot of flashy headlines may lie questionable scientific claims - what should people be aware of when reading the news?<figure><img src="https://images.theconversation.com/files/388443/original/file-20210309-19-1o23lcw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><a class="source" href="https://unsplash.com/photos/_Zua2hyvTBk">(Unsplash/Roman Kraft)</a></span></figcaption></figure><p>Traditional and social media play an important role in disseminating scientific breakthroughs to the public. However, we as an audience, must be cautious in how we consume information from these publicly available sources.</p>
<p>From <a href="https://theconversation.com/pseudoscience-is-taking-over-social-media-and-putting-us-all-at-risk-121062">claims of harmful effects of vaccines</a> to <a href="https://theconversation.com/were-climate-researchers-and-our-work-was-turned-into-fake-news-89999">studies on the extent of climate change</a>, we have learned that behind some news headlines or articles lie either questionable, oversold, or misinterpreted research findings.</p>
<p>So what should readers be aware of when reading news that contain scientific claims?</p>
<h2>A lot of studies don’t hold up to replication</h2>
<p>The first thing that readers should understand before coming to a conclusion when reading research findings in the news, is acknowledging that there is a well-known ‘<a href="https://www.nature.com/news/metascience-could-rescue-the-replication-crisis-1.16275">replication crisis</a>’ in academic research.</p>
<p>This means that a lot of studies that you read in the news fail to produce similar outcomes when other scientists try to confirm them.</p>
<p>For instance, Nature revealed that <a href="https://www.nature.com/news/1.19970">more than 70%</a> of researchers have failed to reproduce another scientist’s findings, and more than 40% have even failed to reproduce their own findings. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=296&fit=crop&dpr=1 600w, https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=296&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=296&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=371&fit=crop&dpr=1 754w, https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=371&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/388482/original/file-20210309-15-oi700w.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=371&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">According to a survey by Nature, more than 70% of studies have failed to reproduce another scientist’s experiments.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/magnifying-glass-pen-over-graph-on-169007078?src=ZXM3US3bRC1l_2YUkqKtLQ-1-37">(Shutterstock/Portrait Image Asia)</a></span>
</figcaption>
</figure>
<p>Similarly, <a href="https://www.nature.com/articles/483531a">a 2012 study</a> reported that only 11% of the 53 new cancer treatments they identified in the previous decade could be replicated, while another that <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/ecoj.12461">examined 159 empirical economics studies</a> showed that 80% of these papers had exaggerated their findings.</p>
<p>Factors that may lead to these non-reproducible results include honest human-error mistakes, poor sampling, “cherrypicking” scientific findings, and in rare cases data manipulation.</p>
<p>A <a href="https://theconversation.com/our-survey-found-questionable-research-practices-by-ecologists-and-biologists-heres-what-that-means-94421">survey from the University of Melbourne</a>, Australia, that involved 800 ecologists and biologists, found that 64% of them had at least once failed to report results from their study because they were not “statistically significant” - meaning they did not show results that the scientists hoped for.</p>
<h2>The media often feeds on our need for hope</h2>
<p>Although the vast majority of scientific research are reputable and reliable, there is the potential for error, fraud, or overstatement of findings.</p>
<p>However, at times, the media can overlooks these flaws - intentionally or otherwise - particularly when it comes to medical research that offer hopes of curing diseases and illnesses.</p>
<p>Let’s recall a breaking news story in 2009 about an Italian researcher, <a href="https://bmcmedethics.biomedcentral.com/articles/10.1186/1472-6939-14-6">Paolo Zamboni</a>, who claimed to cure his wife’s Multiple Sclerosis (MS) by “unblocking” the veins in her neck. He challenged the mainstream belief about MS as a disorder of the immune system, and instead, theorised it as a vascular disease - one that could be cured by clearing blood vessels.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=601&fit=crop&dpr=1 600w, https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=601&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=601&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=756&fit=crop&dpr=1 754w, https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=756&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/388481/original/file-20210309-23-6gzav.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=756&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Paolo Zamboni, professor of vascular surgery at the University of Ferrarra, Italy.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Paolo_Zamboni_image.jpg">(Wikimedia Commons)</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>For the media, however, the most appealing part of this research may have been a man’s quest to save his beloved wife. This romance-fuelled medical triumph - which is a popular story for health reports - appeared to restore the hope of many patients around the world.</p>
<p>Sadly, however, Zamboni’s research had a very small sample size and the design of the experiment had some defects. What attracted much attention was the hype of his romantic story rather than what was supposed to be a medical breakthrough.</p>
<p>Since then, other researchers’ attempt to replicate his findings <a href="https://bmcmedethics.biomedcentral.com/articles/10.1186/1472-6939-14-6">were not successful</a> and <a href="https://www.cbc.ca/news/health/multiple-sclerosis-liberation-therapy-clinical-trial-1.4014494">many incidents</a> of patients’ complications and relapses of the disorder were reported. </p>
<p>Zamboni’s case, however, was just a small story in the bigger picture of how the media can misinterpret or overstate research. It is common for promising health interventions, initially promoted in the media, to not be replicated and failing to result in actual clinical practice.</p>
<p>A 2003 study published in the <a href="https://pubmed.ncbi.nlm.nih.gov/12731504/">American Journal of Medicine</a> looked at 101 articles published in six major science journals that offered novel therapeutic promises. However, among them only five were licensed for clinical use 20 years later and only one had been proven to have a significant health impact.</p>
<h2>There are potential incentives to misreport findings</h2>
<p>Around the world, researchers’ job targets, income, bonus, and promotion can be <a href="https://theconversation.com/unis-want-research-shared-widely-so-why-dont-they-properly-back-academics-to-do-it-151375">tied to their publications</a>.</p>
<p>On the other hand, many high-impact scientific journals - and consequently the media - can seem more attracted to <a href="https://www.sciencedirect.com/science/article/pii/S0959804907006946?casa_token=HiWmQv07WUkAAAAA:8eUfR_wVLu-aBh4OMsISiy16cAO4xupVvmg8s_ag-s7cIryLsKWF2i3-AtcHAnwzgppRYeWsl9qJCw">‘significant’ or positive results</a>, even though non-‘significant’ results and unsuccessful replications can make substantial contributions to scientific knowledge.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=296&fit=crop&dpr=1 600w, https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=296&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=296&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=371&fit=crop&dpr=1 754w, https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=371&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/388483/original/file-20210309-19-18d8oo3.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=371&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Researchers’ job targets, income, bonus, and promotion are often tied to their publications.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/closeup-old-dirty-school-blackboard-stains-1060705337?src=H_2XPJ_5Q3o7mSADUVReww-1-49">(Shutterstock/Denys Kurbatov)</a></span>
</figcaption>
</figure>
<p>Researchers from the University of California Davis in the US <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1123321/#:%7E:text=US%20researchers%20have%20claimed%20that,other%20statistical%20tests%20were%20used">reviewed 359 studies</a> published in leading medical journals in the 1990s, and stated that most of the studies were “reported in a potentially misleading way, with statistics designed to make the results more positive than if other statistical tests were used”.</p>
<p>Many faculty staff have also heard anecdotal accounts of researchers and PhD students re-framing their data or findings to support their initial hypotheses or vice versa. They may even delete, add alter their data to <a href="https://journals.sagepub.com/doi/abs/10.1177/0149206314527133">make their work more publishable</a> and appealing for media coverage. </p>
<p>Every now and then the scientific community catches manipulated studies and journals would then <a href="https://retractionwatch.com/">retract them</a> from publication.</p>
<h2>We should read the news with a critical eye</h2>
<p>Every research study has the potential to improve our understanding of the world we live in.</p>
<p>However, we should be careful of overstated findings, studies that have yet to be replicated, or research that has not been published in credible peer-reviewed sources.</p>
<p>It will take more effort, but readers should be cautious of single studies, and instead seek to look at what the broader scientific community says about the topic.</p>
<p>The COVID-19 pandemic highlighted the dangers of <a href="https://www.sciencedirect.com/science/article/pii/S2590061720300569">misinformation</a> and how it can <a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)30461-X/fulltext">spread faster</a> than any natural airborne virus. If the findings we read seem too good to be true, they probably are!</p><img src="https://counter.theconversation.com/content/155481/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Saeed Pahlevansharif has received several research grants for unrelated projects. There is no conflict of interests regarding the publication of this article.</span></em></p><p class="fine-print"><em><span>Hassam Waheed has received several research grants for unrelated projects. There is no conflict of interests regarding the publication of this article.</span></em></p><p class="fine-print"><em><span>Kelly-Ann Allen has received several research grants for unrelated projects. There is no conflict of interests regarding the publication of this article.</span></em></p><p class="fine-print"><em><span>Navaz Naghavi has received several research grants for unrelated projects. There is no conflict of interests regarding the publication of this article.</span></em></p><p class="fine-print"><em><span>Nicholas Gamble receives funding from the Australian Research Council.</span></em></p>Behind a lot of news headlines often lie either questionable, oversold or misinterpreted research findings. So what should readers be aware of when reading news that contain scientific claims?Saeed Pahlevansharif, Associate Professor, Taylor's UniversityHassam Waheed, Taylor's UniversityKelly-Ann Allen, Senior Lecturer, School of Education, Monash UniversityNavaz Naghavi, Lecturer, Taylor's UniversityNicholas Gamble, Lecturer, Monash UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1322522020-12-27T20:40:45Z2020-12-27T20:40:45ZHumans learn from mistakes — so why do we hide our failures?<figure><img src="https://images.theconversation.com/files/374663/original/file-20201214-23-1r705ym.jpg?ixlib=rb-1.1.0&rect=456%2C244%2C2808%2C1380&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">pxfuel</span>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span></figcaption></figure><p>A few years ago I had the pleasure of listening to the highly-influential legal scholar Cass Sunstein speak in the flesh. Cass wrote the best-selling book <a href="https://www.penguinrandomhouse.com/books/304634/nudge-by-richard-h-thaler-and-cass-r-sunstein/">Nudge</a>, along with his long-time collaborator Richard Thaler.</p>
<p>Thaler subsequently won the <a href="https://www.nobelprize.org/prizes/economic-sciences/2017/popular-information/">Nobel Prize in Economics</a> and Cass went to the White House to head up a team advising the <a href="https://www.nytimes.com/2010/05/16/magazine/16Sunstein-t.html">Obama administration</a>. </p>
<p>It was among the first of what came to be <a href="https://theconversation.com/coronavirus-how-the-uk-government-is-using-behavioural-science-134097">hundreds</a> <a href="https://www.vic.gov.au/behavioural-insights">of</a> <a href="https://behaviouraleconomics.pmc.gov.au/">government</a> <a href="https://www.canada.ca/en/innovation-hub.html">teams</a> around the world using their insights into human behaviour to improve what governments did.</p>
<p>Cass was speaking <a href="https://www.smh.com.au/opinion/just-a-nudge-why-malcolm-turnbull-is-embracing-behavioral-economics-20151127-gl9mld.html">in Canberra</a> and I asked whether he could talk about nudges that hadn’t worked. His initial answer surprised me – he said none came to mind.</p>
<h2>So what is nudging?</h2>
<p>To backtrack, it’s important to understand what a nudge is. The concept is based on the idea that people often act “irrationally”.</p>
<p>By itself this isn’t a particularly useful insight. What is a useful is the insight that they behave irrationally in ways we can predict.</p>
<p>Here’s one. We are lazy, so when placed with a plethora of offers about what to buy or sign up to we often stick with what we’ve got, the “don’t need to think about it option”, even when there are better deals on the table.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/the-psychology-of-christmas-shopping-how-marketers-nudge-you-to-buy-88011">The psychology of Christmas shopping: how marketers nudge you to buy</a>
</strong>
</em>
</p>
<hr>
<p>And we tend to value the present over the future – so while we know we shouldn’t eat junk food, we often prioritise short-term satisfaction over long-term health.</p>
<p>These insights into behavioural regularities allow us to tailor government programs to get better outcomes.</p>
<p>For example, in Britain 80% of people say they are willing to donate an organ when they die, but only 37% put their names on the register. </p>
<p>To bridge this gap the government is <a href="https://www.gov.uk/government/consultations/introducing-opt-out-consent-for-organ-and-tissue-donation-in-england">changing the system</a> so that the default option is to be a donor.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/an-opt-out-system-isnt-the-solution-to-australias-low-rate-of-organ-donation-108336">An opt-out system isn't the solution to Australia's low rate of organ donation</a>
</strong>
</em>
</p>
<hr>
<p>People can still opt-out if they want to – but the simple switch is likely to save as many as 700 lives per year.</p>
<p>We like to behave like those around us, so here in Australia to help combat the rise of drug-resistant superbugs, the chief medical officer wrote to the <a href="https://behaviouraleconomics.pmc.gov.au/projects/nudge-vs-superbugs-behavioural-economics-trial-reduce-overprescribing-antibiotics">highest prescribers of antibiotics</a> pointing out they weren’t in line with their peers. </p>
<p>It cut the prescribing rate of the highest prescribers by 12% in six months.</p>
<h2>Then why was Cass’ answer surprising?</h2>
<p>I was surprised because nudging promotes rigorous trials, evidence and testing – so it’s hard to believe every proposal would be found to have worked.</p>
<figure class="align-right ">
<img alt="" src="https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=969&fit=crop&dpr=1 600w, https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=969&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=969&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=1218&fit=crop&dpr=1 754w, https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=1218&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/374671/original/file-20201214-13-1i6b8uh.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=1218&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Cass Sunstein at the BETA conference.</span>
<span class="attribution"><a class="source" href="https://behaviouraleconomics.pmc.gov.au/behavioural-exchange-2018/keynote-address-ethics-and-behavioural-insights">BETA</a></span>
</figcaption>
</figure>
<p>In science, experiments frequently throw up unexpected results.</p>
<p>Only publishing the results of successful trials would lead to bulging cabinets of failures from which we would never learn. </p>
<p>Given that failure is one of our most effective teachers, it would be a huge missed opportunity.</p>
<p>And the <a href="https://en.wikipedia.org/wiki/False_positives_and_false_negatives">false positives</a> that would be published along with any genuine positives would inflate the belief that the intervention worked. </p>
<p>Any experiment involving an element of randomness (in the subjects selected or conditions in which it was conduced) will occasionally report a positive effect that wasn’t there.</p>
<p>This “<a href="https://www.psychologytoday.com/us/basics/replication-crisis">replication crisis</a>” has been recognised as big problem in psychology and economics, with many previously results being <a href="http://science.sciencemag.org/content/349/6251/aac4716">thrown into doubt</a>.</p>
<p>Thankfully things are changing for the better. There are a range of initiatives encouraging the publication of both positive and negative results, along with a far greater awareness of these questionable research practices.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/the-replication-crisis-has-engulfed-economics-49202">The replication crisis has engulfed economics</a>
</strong>
</em>
</p>
<hr>
<p>And they are embraced by the Australian government’s own Behavioural Economics Team, <a href="https://behaviouraleconomics.pmc.gov.au/">BETA</a>, with whom I work.</p>
<p>To guard against the publishing of only results that fit a narrative, BETA pre-registers its analysis plan, which means it can’t decide to pick out only the results that fit a particular story once the trial is done. </p>
<p>BETA has also set up an <a href="https://behaviouraleconomics.pmc.gov.au/blog/strengthening-links-academia">external advisory panel of academics</a> (on which I sit) to give independent advice on transparency, trial design and analysis.</p>
<p>It has had some <a href="https://behaviouraleconomics.pmc.gov.au/projects/improving-government-confirmation-processes-using-sms">very</a> <a href="https://behaviouraleconomics.pmc.gov.au/projects/credit-when-its-due">successful</a> <a href="https://behaviouraleconomics.pmc.gov.au/projects/energy-labels-make-cents-randomised-controlled-trial-test-effect-appliance-energy-rating">trials</a>, but also some with surprising results.</p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/k6QAiog1gnw?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
</figure>
<p>When it set out to discover whether a fact sheet enabling households to compare electricity plans would encourage them to switch to better ones it discovered (at least in the experiment conducted) <a href="https://behaviouraleconomics.pmc.gov.au/projects/simplifying-energy-fact-sheets-improve-consumer-understanding">it did not</a>.</p>
<p>When it set out to discover whether removing identifying information from public service job applications would increase the proportion of women and minorities shortlisted for interviews it discovered (at least in the experiment conducted) <a href="https://behaviouraleconomics.pmc.gov.au/projects/going-blind-see-more-clearly-unconscious-bias-australian-public-service-aps-shortlisting">it did not</a>.</p>
<p>These findings give us just as much useful information as the trials that were “successful”. They can help the government design better programs.</p>
<h2>There’s a happy ending to this story</h2>
<p>Back at the conference, after his initial answer Cass reflected further. He did recall some failures, and he talked about the lessons learned. </p>
<p>Since then, he has even published a paper, <a href="https://www.cambridge.org/core/journals/behavioural-public-policy/article/abs/nudges-that-fail/8DE5FFFFB7DA5BE14F8DC1E3D2C0C0AA">Nudges that Fail</a> that provides insights every bit as good as those from nudges that succeed.</p>
<p>Feel free to check out <a href="https://behaviouraleconomics.pmc.gov.au/projects">BETA’s list</a>, the good and the bad. </p>
<p>It’s important to embrace mistakes, and to make more than a few. It’s the only way to be sure we are really learning.</p><img src="https://counter.theconversation.com/content/132252/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Ben Newell receives funding from the Australian Research Council</span></em></p>Australia’s behavioral economics unit publishes rather than hides the results of its unsuccessful experiments.Ben Newell, Professor of Cognitive Psychology, UNSW SydneyLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1240762019-09-24T20:14:18Z2019-09-24T20:14:18ZReal problem, wrong solution: why the Nationals shouldn’t politicise the science replication crisis<p>The <a href="https://campusmorningmail.com.au/news/national-party-wants-independent-agency-to-vet-research/">National Party</a>, Queensland farming lobby group <a href="https://www.abc.net.au/7.30/farmers-fight-tough-new-rules-to-protect-the-great/11526168">AgForce</a>, and MP <a href="https://www.bobkatter.com.au/media/media-releases/view/1029/katter-demands-govt-audit-reef-quality-science-/media-releases">Bob Katter</a> have banded together to propose an “independent science quality assurance agency”.</p>
<p>To justify their position, Liberal-National MP George Christensen and AgForce’s Michael Guerin specifically invoked the “replication crisis” in science, in which researchers in various fields have found it difficult or impossible to reproduce and validate original research findings. Their proposal, however, is not a good solution to the problem. </p>
<p>The more important context is that these politicians and lobbyists are opposed to <a href="https://www.qld.gov.au/environment/agriculture/sustainable-farming/reef/reef-regulations/strengthening-regulations">new laws</a> to curb agricultural runoff onto the Great Barrier Reef that are underpinned by research finding evidence of <a href="https://theconversation.com/cloudy-issue-we-need-to-fix-the-barrier-reefs-murky-waters-39380">harm from poor water quality</a>. Christensen <a href="https://www.facebook.com/gchristensenmp/photos/a.769408183114112/2334140303307551/?type=3&theater">suggests</a> that many scientific papers behind such regulation “have never been tested and their conclusions may be wrong”. But Christensen seems to be targeting specific results he doesn’t like, rather than trying to improve scientific practice in a systematic way.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">Science is in a reproducibility crisis – how do we resolve it?</a>
</strong>
</em>
</p>
<hr>
<p>In various scientific areas, including psychology and preclinical medicine, <a href="https://www.nature.com/articles/d41586-018-06075-z">large-scale replication projects</a> have failed to reproduce the findings of many original studies. The rates of success differ between fields, but on average only <a href="https://cos.io/about/news/28-classic-and-contemporary-psychology-findings-replicated-more-60-laboratories-each-across-three-dozen-nations-and-territories/">half</a> <a href="https://www.castoredc.com/blog/replication-crisis-medical-research">or</a> <a href="https://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248">fewer</a> of published studies were successfully replicated. Clearly there is a problem.</p>
<p>Much of the problem is due to hyper-competitiveness in science, funding shortfalls, publication practices, and the use of performance metrics that privilege quantity over quality. </p>
<p>Scientists themselves have <a href="https://theconversation.com/there-is-a-problem-australias-top-scientist-alan-finkel-pushes-to-eradicate-bad-science-123374">documented the poor practices</a> that underlie this crisis, such as the <a href="https://theconversation.com/our-survey-found-questionable-research-practices-by-ecologists-and-biologists-heres-what-that-means-94421">misuse of statistics</a>, often unwittingly, in ways that bias findings towards attention-grabbing conclusions. These practices distort the evidence available to policy-makers and other researchers. </p>
<p>Scientists have also already produced responses to some problems: <a href="https://www.nature.com/articles/d41586-019-02674-6">reforms in peer review</a>, <a href="https://cos.io/top/">guidelines for methods and statistical reporting</a>, and <a href="https://osf.io/dashboard">new platforms for data sharing</a>. These improvements are possible only by taking the replication crisis seriously. Paying lip service to it so as to attack particular legislation is the opposite of this.</p>
<h2>Making decisions under uncertainty</h2>
<p>Establishing an agency with a mission to adjudicate on hand-picked scientific results would make things worse. </p>
<p>At best, such an agency will be one more review panel. At worst, it will be a bureaucratic front for the political agenda of the day. Either way, it will make scientists even more cautious, and delay the flow of information to policy-makers.</p>
<p>The track records of the lobbyists involved in this latest move suggests that they have little genuine interest in improving science. AgForce reportedly <a href="https://www.theguardian.com/australia-news/2019/may/02/agforce-deletes-decades-worth-of-data-from-government-funded-barrier-reef-program">deleted more than a decade’s worth of data meant for a government water quality program</a> in advance of the new runoff regulations taking effect.</p>
<p>Exploiting scientific uncertainty has long been a classic tactic of industry lobbyists. It has been used to justify inaction on everything from <a href="https://www.merchantsofdoubt.org/">tobacco</a> to <a href="https://www.thegwpf.com/donna-laframboise-peer-review-why-skepticism-is-essential/">climate change</a>. Local politicians and lobby groups seem to be copying moves from a well-worn overseas playbook in their misuse of the replication crisis.</p>
<p>Scientists can never make pronouncements with the certainty of a politician. But if, as a society, we want to benefit fully from science, we need to accept the idea of scientific uncertainty. The existence of uncertainties does not justify rejection of the best available evidence.</p>
<h2>To defend science we need to improve it</h2>
<p>It is tempting to respond to politically motivated attacks on science by simply pointing to the excellent track record of scientific knowledge, or the good intentions of the vast majority of scientists. </p>
<p>But there is a better reason: scientists themselves have been improving science. As advocates of reform, we have been told that pointing out problems helps the anti-science movement. We disagree: being open about our work to improve science is essential for building public trust.</p>
<p>Science is something that humans do. It is self-correcting when, and only when, scientists <a href="https://twitter.com/jamesheathers/status/845696144999137280">correct it</a>. Research is hard work, and we can’t expect scientists never to make errors or to provide complete certainty. But we can expect scientists to create a culture that values detecting and correcting errors.</p>
<p>Admitting errors in one’s own work, finding them in others’ work, reporting them, retracting results when necessary, and correcting the record are activities that should be the most highly regarded of scientific practices. We need to shift the balance of rewards away from rewarding only groundbreaking discoveries, and towards the painstaking work of confirmation.</p>
<p>A cultural shift in this regard is already underway, to better align <a href="https://www.abc.net.au/radionational/programs/bigideas/sharing-science-%E2%80%93-for-the-good-of-all/11330816">scientific practices with scientific values</a>. But there is more to be done, and governments can help.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/scientific-data-should-be-shared-an-open-letter-to-the-arc-9458">Scientific data should be shared: an open letter to the ARC</a>
</strong>
</em>
</p>
<hr>
<p>There are sensible policies to support the open science initiatives that will reduce error production and increase error detection in scientific work. Different fields need different approaches, but here are two ideas.</p>
<p>First, improve funding allocation procedures. Reward self-correcting activities such as replication studies. Don’t require every piece of funded research to be groundbreaking. Don’t rely on flawed metrics. Enforce best-practice data management and open data practices whenever feasible. This can all be done without establishing an inefficient agency whose likely effect is to delay action.</p>
<p>Second, <a href="https://theconversation.com/from-fraud-to-fair-play-australia-must-support-research-integrity-15733">establish a national independent office of research integrity</a> to allow errors in the scientific literature, whether deliberate or accidental, to be corrected in a fair, efficient, and systematic way. Unlike the politicians’ proposal, this would improve the process for all researchers, not just act as a handbrake on research findings that lobbyists don’t like.</p><img src="https://counter.theconversation.com/content/124076/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Martin Bush receives funding from DARPA (US Defense) for a project under the SCORE program, about predicting the likelihood of replication of published studies in social science.</span></em></p><p class="fine-print"><em><span>Alex O. Holcombe has received funding from the Australian Research Council. </span></em></p><p class="fine-print"><em><span>Bonnie Wintle receives funding from a University of Melbourne Research Fellowship (Career Interruptions). She also receives funding from DARPA (US Defense) for a project under the SCORE program, about predicting the likelihood of replication of published studies in social science. </span></em></p><p class="fine-print"><em><span>Fiona Fidler receives funding from the ARC, including a current Future Fellowship about replicability and reproducibility in ecology and environmental science. She also has funding from DARPA (US Defense) for a project under the SCORE program, about predicting the likelihood of replication of published studies in social science.</span></em></p><p class="fine-print"><em><span>Simine Vazire receives funding from the National Science Foundation (USA) and the Templeton Foundation. </span></em></p>Across science, only around half of published results can be successfully replicated. But while this is a serious problem, the proposed public audit looks like a political bid to cast doubt on science.Martin Bush, Research Fellow in History and Philosophy of Science, The University of MelbourneAlex O. Holcombe, Professor, School of Psychology, University of SydneyBonnie Claire Wintle, Research fellow, The University of MelbourneFiona Fidler, Associate Professor, School of Historical and Philosophical Studies, The University of MelbourneSimine Vazire, Professor, University of California, DavisLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1037362019-04-08T10:44:52Z2019-04-08T10:44:52ZThe replication crisis is good for science<figure><img src="https://images.theconversation.com/files/258873/original/file-20190213-181604-48h9rx.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Some studies don't hold up to added scrutiny. </span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/magnifying-glass-pen-over-graph-on-169007078?src=ZXM3US3bRC1l_2YUkqKtLQ-1-37">PORTRAIT IMAGES ASIA BY NONWARIT/shutterstock.com</a></span></figcaption></figure><p>Science is in the midst of a crisis: A surprising fraction of published studies fail to replicate when the procedures are repeated. </p>
<p>For example, take the study, published in 2007, that claimed that tricky math problems requiring careful thought <a href="https://doi.org/10.1037/0096-3445.136.4.569">are easier to solve when presented in a fuzzy font</a>. When researchers found in a small study that using a fuzzy font improved performance accuracy, it supported a claim that encountering perceptual challenges could induce people to reflect more carefully.</p>
<p>However, <a href="https://doi.org/10.1037/xge0000049">16 attempts to replicate the result failed</a>, definitively demonstrating that the original claim was erroneous. Plotted together on a graph, the studies formed a perfect bell curve centered around zero effect. As is frequently the case with failures to replicate, of the 17 total attempts, the original had both the smallest sample size and the most extreme result.</p>
<p>The Reproducibility Project, a collaboration of 270 psychologists, has <a href="https://osf.io/ezcuj/">attempted to replicate 100 psychology studies</a>, while <a href="https://doi.org/10.1038/s41562-018-0399-z">a 2018 report</a> examined studies published in the prestigious scholarly journals Nature and Science between 2010 and 2015. These efforts find that about two-thirds of studies do replicate to some degree, but that the strength of the findings is often weaker than originally claimed. </p>
<p>Is this bad for science? It’s certainly uncomfortable for many scientists whose work gets undercut, and the rate of failures may currently be unacceptably high. But, as a psychologist and a statistician, I believe confronting the replication crisis is good for science as a whole.</p>
<h2>Practicing good science</h2>
<p>First, these replication attempts are examples of good science operating as it should. They are focused applications of the scientific method, careful experimentation and observation in the pursuit of reproducible results. </p>
<p>Many people incorrectly assume that, due to the “p<.05” threshold for statistical significance, only 5% of discoveries will prove to be errors. However, 15 years ago, physician John Ioannidis pointed to some fallacies in that assumption, arguing that false discoveries <a href="https://doi.org/10.1371/journal.pmed.0020124">made up the majority of the published literature</a>. Replication efforts are confirming that the false discovery rate is much higher than 5%. </p>
<p>Awareness about the replication crisis appears to be promoting better behavior among scientists. Twenty years ago, the cycle for publication was basically complete after a scientist convinced three reviewers and an editor that the work was sound. Yes, the published research would become part of the literature, and therefore open to review – but that was a slow-moving process. </p>
<p>Today, the stakes have been raised for researchers. They know that there’s the possibility that their study might be reviewed by thousands of opinionated commenters on the internet or by a high-profile group like the Reproducibility Project. Some journals now require scientists to make their data and computer code available, which makes it likelier that others will catch errors in their work. What’s more, some scientists can now “preregister” their hypotheses before starting their study – the equivalent of calling your shot before you take it.</p>
<p>Combined with open sharing of materials and data, preregistration improves the transparency and reproducibility of science, hopefully ensuring that a smaller fraction of future studies will fail to replicate. </p>
<p>While there are signs <a href="https://fivethirtyeight.com/features/psychologys-replication-crisis-has-made-the-field-better/">that scientists are indeed reforming their ways</a>, there is still a long way to go. Out of the 1,500 accepted presentations <a href="https://plan.core-apps.com/sbm_annual2019">at the annual meeting for the Society for Behavioral Medicine in March</a>, only 1 in 4 of the authors reported using these open science techniques in the work they presented. </p>
<h2>Improving statistical intuition</h2>
<p>Finally, the replication crisis is helping improve scientists’ intuitions about statistical inference. </p>
<p>Researchers now better understand how weak designs with high uncertainty – in combination with choosing to publish only when results are statistically significant – produce exaggerated results. In fact, it is one of the reasons more than 800 scientists recently argued in favor of abandoning <a href="https://www.nature.com/articles/d41586-019-00857-9">statistical significance testing</a>.</p>
<p>We also better appreciate how isolated research findings fit into the broader pattern of results. In another study, Ionnadis and oncologist Jonathan Schoenfeld <a href="https://doi.org/10.3945/ajcn.112.047142">surveyed the epidemiology literature</a> for studies associating 40 common food ingredients with cancer. There were some broad consistent trends – unsurprisingly, bacon, salt and sugar are never found to be protective against cancer. </p>
<p>But plotting the effects from 264 studies produced a confusing pattern. The magnitudes of the reported effects were highly variable. In other words, one study might say that a given ingredient was very bad for you, while another might conclude that the harms were small. In many cases, the studies even disagreed on whether a given ingredient was harmful or beneficial. </p>
<p>Each of the studies had at some point been reported in isolation in a newspaper or a website as the latest finding in health and nutrition. But taken as a whole, the evidence from all the studies was not nearly as definitive as each single study may have appeared.</p>
<p>Schoenfeld and Ioannidis also graphed the 264 published effect sizes. Unlike the fuzzy font replications, their graph of published effects looked like the tails of a bell curve. It was centered at zero with all the nonsignificant findings carved out. The unmistakable impression from seeing all the published nutrition results presented at once is that many of them might be like the fuzzy font result – impressive in isolation, but anomalous under replication. </p>
<p>The breathtaking possibility that a large fraction of published research findings might just be serendipitous is exactly why people speak of the replication crisis. But it’s not really a scientific crisis, because the awareness is bringing improvements in research practice, new understandings about statistical inference and an appreciation that isolated findings must be interpreted as part of a larger pattern.</p>
<p>Rather than undermining science, I feel that this is reaffirming the best practices of the scientific method.</p><img src="https://counter.theconversation.com/content/103736/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Eric Loken does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Rising evidence shows that many psychology studies don’t stand up to added scrutiny. The problem has many scientists worried – but it could also encourage them to up their game.Eric Loken, Assistant Professor of Educational Psychology, University of ConnecticutLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1141612019-04-01T10:40:12Z2019-04-01T10:40:12ZIs it the end of ‘statistical significance’? The battle to make science more uncertain<figure><img src="https://images.theconversation.com/files/266165/original/file-20190327-139371-15nimd0.jpg?ixlib=rb-1.1.0&rect=3%2C0%2C2625%2C1552&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Some scientists think it's time to hang up statistical significance.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/white-robe-hanging-on-door-laboratory-321548114">mariakraynova/Shutterstock.com</a></span></figcaption></figure><p>The scientific world is abuzz following recommendations by two of the most prestigious scholarly journals – <a href="https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913">The American Statistician</a> and <a href="https://www.nature.com/articles/d41586-019-00857-9">Nature</a> – that the term “<a href="https://en.wikipedia.org/wiki/Statistical_significance">statistical significance</a>” be retired.</p>
<p>In their introduction to the special issue of The American Statistician on the topic, the journal’s editors urge “moving to a world beyond ‘p<0.05,’” <a href="http://www.haghish.com/resources/materials/Statistical_Methods_for_Research_Workers.pdf">the famous 5 percent threshold</a> for determining whether a study’s result is statistically significant. If a study passes this test, it means that the probability of a result being due to chance alone is less than 5 percent. This has often been understood to mean that the study is worth paying attention to.</p>
<p>The journal’s basic message – but not necessarily the consensus of the 43 articles in this issue, one of which <a href="https://doi.org/10.1080/00031305.2018.1518788">I contributed</a> – was that scientists first and foremost should “embrace uncertainty” and “be thoughtful, open and modest.”</p>
<p>While these are fine qualities, I believe that scientists must not let them obscure the precision and rigor that science demands. Uncertainty is inherent in data. If scientists further weaken the already very weak threshold of 0.05, then that would inevitably make scientific findings more difficult to interpret and less likely to be trusted. </p>
<h2>Piling difficulty on top of difficulty</h2>
<p>In the traditional practice of science, a scientist generates a hypothesis and designs experiments to collect data in support of hypotheses. He or she then collects data and performs statistical analyses to determine if the data did in fact support the hypothesis. </p>
<p>One standard statistical analysis is the <a href="https://en.wikipedia.org/wiki/P-value">p-value</a>. This generates a number between 0 and 1 that indicates strong, marginal or weak support of a hypothesis.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=396&fit=crop&dpr=1 600w, https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=396&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=396&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=498&fit=crop&dpr=1 754w, https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=498&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/266547/original/file-20190329-70999-ktz14q.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=498&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">A quick guide to p-values.</span>
<span class="attribution"><span class="source">Repapetilto/Wikimedia</span>, <a class="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA</a></span>
</figcaption>
</figure>
<p>But I worry that abandoning evidence-driven standards for these judgments will make it even more difficult to design experiments, much less assess their outcomes. For instance, how could one even determine an appropriate sample size without a targeted level of precision? And how are research results to be interpreted? </p>
<p>These are important questions, not just for researchers at funding or regulatory agencies, but for anyone whose daily life is influenced by statistical judgments. That includes anyone who takes medicine or undergoes surgery, drives or rides in vehicles, is invested in the stock market, has life insurance or depends on accurate weather forecasts… and the list goes on. Similarly, many regulatory agencies rely on statistics to make decisions every day.</p>
<p>Scientists must have the language to indicate that a study, or group of studies, provided significant evidence in favor of a relationship or an effect. Statistical significance is the term that serves this purpose.</p>
<h2>The groups behind this movement</h2>
<p>Hostility to the term “statistical significance” arises from two groups.</p>
<p>The first is largely made up of scientists disappointed when their studies produce p=0.06. In other words, those whose studies just don’t make the cut. These are largely <a href="https://www.researchgate.net/publication/319880949_Justify_your_alpha">scientists who find the 0.05 standard too high</a> a hurdle for getting published in the scholarly journals that are a major source of academic knowledge – as well as tenure and promotion. </p>
<p>The second group is concerned over the <a href="https://theconversation.com/a-statistical-fix-for-the-replication-crisis-in-science-84896">failure to replicate scientific studies</a>, and they blame significance testing in part for this failure.</p>
<p>For example, <a href="https://doi.org/10.1126/science.aac4716">a group of scientists</a> recently repeated 100 published psychology experiments. Ninety-seven of the 100 original studies reported a statistically significant finding (p<0.05), but only 36 of the repeated experiments were able to also achieving a significant result. </p>
<p>The failure of so many studies to replicate can be partially blamed on publication bias, which results when only significant findings are published. Publication bias causes scientists to overestimate the <a href="https://doi.org/10.4065/75.12.1284">magnitude of an effect</a>, such as the relationship between two variables, making replication less likely.</p>
<p>Complicating the situation even further is the fact that <a href="https://doi.org/10.1073/pnas.1313476110">recent research</a> shows that the p-value cutoff doesn’t provide much evidence that a real relationship has been found. In fact, in replication studies in social sciences, it now appears that p-values close to the standard threshold of 0.05 probably mean that a scientific claim is wrong. It’s only when the p-value is much smaller, maybe less than 0.005, that scientific claims are likely to <a href="https://www.nature.com/articles/s41562-017-0189-z">show a real relationship</a>. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/266166/original/file-20190327-139345-fw20gp.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">What do the data really say?</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/african-american-businessman-using-devices-business-1017688327">fizkes/shutterstock.com</a></span>
</figcaption>
</figure>
<h2>The confusion leading to this movement</h2>
<p>Many nonstatisticians confuse p-value with the <a href="https://en.wikipedia.org/wiki/Probability">probability</a> that no discovery was made.</p>
<p>Let’s look at an example from the Nature article. <a href="https://doi.org/10.1016/j.ijcard.2014.09.205">Two studies</a> examined the increased risk of disease after taking a drug. Both studies estimated that patients had a 20 percent higher risk of getting the disease if they take the drug than if they didn’t. In other words, both studies estimated the <a href="https://en.wikipedia.org/wiki/Risk_ratio">relative risk</a> to be 1.20. </p>
<p>However, the relative risk estimated from one study was more precise than the other, because its estimate was based on outcomes from many more patients. Thus, the estimate from one study was statistically significant, and the estimate from the other was not.</p>
<p>The authors cite this inconsistency – that one study obtained a significant result and the other didn’t – as evidence that statistical significance leads to misinterpretation of scientific results. </p>
<p>However, I feel that a reasonable summary is simply that one study collected statistically significant evidence and one did not, but the estimates from both studies suggested that relative risk was near 1.2.</p>
<h2>Where to go from here</h2>
<p>I agree with the Nature article and The American Statistician editorial that data collected from all well-designed scientific studies should be made publicly available, with comprehensive summaries of statistical analyses. Along with each study’s p-values, it is important to publish estimates of effect sizes and confidence intervals for these estimates, as well as complete descriptions of all data analyses and data processing. </p>
<p>On the other hand, only studies that provide strong evidence in favor of important associations or new effects should be published in premier journals. For these journals, standards of evidence should be increased by requiring smaller p-values for the initial report of relationships and new discoveries. In other words, make scientists publish results that they’re even more certain about.</p>
<p>The bottom line is that dismantling accepted standards of statistical evidence will decrease the uncertainty that scientists have in publishing their own research. But it will also increase the public’s uncertainty in accepting the findings that they do publish – and that can be problematic.</p><img src="https://counter.theconversation.com/content/114161/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Valen E. Johnson receives funding from the National Institutes of Health to perform biostatistical research on the selection of variables associated with cancer and cancer research. </span></em></p>Two prestigious journals have suggested abandoning the traditional test of the strength of a study’s results. But a statistician worries that this would make science worse.Valen E. Johnson, University Distinguished Professor and Department Head of Statistics, Texas A&M UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1013032018-09-30T15:14:39Z2018-09-30T15:14:39ZOpening up the future of psychedelic science<figure><img src="https://images.theconversation.com/files/237197/original/file-20180919-158228-1p66ki0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">There is a growing research literature suggesting psychedelics hold incredible promise for treating mental health ailments ranging from depression and anxiety to PTSD.</span> <span class="attribution"><span class="source">(Shutterstock)</span></span></figcaption></figure><p>Attempts to replicate classical scientific studies have been failing. These alarming failures have hit psychology, the life sciences and other fields, calling major findings into question. <a href="https://doi.org/10.1038/533452a">Scientists agree</a>: questionable research practices are <a href="https://doi.org/10.1371/journal.pmed.0020124">rife in many disciplines</a>. </p>
<p>We are two psychology PhD students with experience researching mindfulness. We echo the <a href="https://www.scientificamerican.com/article/wheres-the-proof-that-mindfulness-meditation-works1/">scathing criticisms levelled against poorly designed studies within the field of mindfulness research</a>. </p>
<p>As science is only trustworthy when consistent, we need to make sure future work can be replicated. As such, we have decided to spread the word about proper open scientific practice. This is especially important in the nascent interdisciplinary field of psychedelic science, in which we are now conducting research into the practise of “microdosing” substances like LSD (lysergic acid diethylamide) and “magic” mushrooms (psilocybin).</p>
<p>There is a growing research literature suggesting psychedelics hold <a href="https://theconversation.com/the-real-promise-of-lsd-mdma-and-mushrooms-for-medical-science-100579">incredible promise</a> for treating mental health ailments ranging from <a href="https://www.businessinsider.com/psychedelics-trip-therapy-2018-1">depression and anxiety</a> to <a href="https://www.cnn.com/2018/05/01/health/mdma-psychotherapy-ptsd-study/index.html">PTSD</a>. But how do we know for sure?</p>
<p>The way forward for psychedelics is through “open science.” Researchers should pre-register their plans and share their data, <a href="https://osf.io/g5cwy/">as we have in our own research</a>. </p>
<h2>Science must be consistent</h2>
<p>Science needs to have a strong foundation, but right now a lot of the research isn’t replicating. In 2015, the <a href="http://science.sciencemag.org/content/349/6251/aac4716">Reproducibility Project</a> tried to replicate 100 high quality psychology findings. Only <a href="https://www.nature.com/news/over-half-of-psychology-studies-fail-reproducibility-test-1.18248">39 of these findings were replicated</a> — that’s less than half! </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/237196/original/file-20180919-158237-i55cwn.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Mindfulness research lacks active control groups and has inconsistent definitions of mindfulness itself.</span>
<span class="attribution"><span class="source">(Shutterstock)</span></span>
</figcaption>
</figure>
<p>This phenomenon isn’t limited to psychology: findings from disciplines such as biology, medicine and chemistry can be hard to believe. For example, <a href="https://retractionwatch.com/2017/07/31/nearly-500-researchers-guilty-misconduct-says-chinese-govt-investigation/">almost 500 authors</a> were found guilty of misconduct by the Chinese government last year, <a href="https://retractionwatch.com/2018/09/04/cancer-journals-retract-10-papers-flag-8-more-and-apologize-for-the-delay/#more-70872">several cancer research papers</a> have been retracted recently and a recent report indicated that as much as <a href="https://pubs.acs.org/doi/pdf/10.1021/acs.jchemed.7b00907">80 per cent of chemists</a> have trouble replicating findings from the literature.</p>
<p><a href="https://theconversation.com/ca/topics/replicability-8238">Several great pieces</a> on <em>The Conversation</em> have tackled this issue so there is lots to review if replicability is new to you. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/why-students-are-the-answer-to-psychologys-replication-crisis-90286">Why students are the answer to psychology's replication crisis</a>
</strong>
</em>
</p>
<hr>
<p>Psychedelic research is an interdisciplinary field combining psychology, biology and medicine and so is an especially important field in which to implement “open science.” </p>
<h2>Open science <strong>= rigorous science</strong></h2>
<p>For <a href="https://www.sagepub.com/sites/default/files/upm-binaries/40007_Chapter8.pdf">statistics in science to work properly</a>, scientists need to guarantee that what they have studied is no more and no less than what they intended to study. </p>
<p>Instead of hiding inconvenient results or adding unplanned research conditions, scientists can use open science to demonstrate their integrity. Open science involves pre-registering hypotheses before doing research, and publishing the entire data set once the research is done. </p>
<p><a href="https://osf.io">Pre-registration happens online</a>. The content of the registration is locked and time stamped, then kept confidential until a set date, when it is released for the public to see. This is done so that the researcher can show they did exactly what they planned to do, which is how we all learned we are supposed to do science. Pre-registration is not even difficult, but researchers need to <a href="https://www.youtube.com/watch?v=kzUtpDBo8wk&index=1&list=PLMOU-iLiJIc0amNVabGXJ0liKwIwxqkO8">learn how to do it</a> and adjust.</p>
<p>Once the study has been published, the data set can be made public. This way, the entire scientific community can examine the data, serving at least two purposes. First, the scientific community can verify that the data supports the conclusions made in the study, ensuring no mistakes were made. Second, other scientists can explore for new patterns in the data to create new hypotheses for new studies, moving science forward faster. </p>
<p>Making the data public makes scientists publicly accountable, and is good for the scientific community at large.</p>
<h2>Co-operation over competition</h2>
<p>So far, most psychedelic research has not been pre-registered, which means it should be considered exploratory and, unfortunately, inconclusive. Some findings may have been by chance rather than clearly caused by the substances used, and these findings need to be replicated by independent labs to ensure they hold up. </p>
<p>A recent call for “<a href="http://chacruna.net/cooperation-over-competition-statement-on-open-science-for-psychedelic-medicines-and-practices/">Cooperation Over Competition</a>” has been made, but its impact remains to be seen. For now, we take the results on psychedelics that scientists have found on faith.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/237226/original/file-20180919-143281-1qtxrlg.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">The way forward is for scientists to share their plans and data.</span>
<span class="attribution"><span class="source">(Shutterstock)</span></span>
</figcaption>
</figure>
<p>Pre-registration is the only way to ensure psychedelic science is conducted with a high level of integrity. Psychedelic science is in its infancy, much as mindfulness research was some few decades ago. We must learn from past mistakes if we do not wish to see the same harsh criticisms levelled upon this field in the future.</p>
<p>This will improve and maintain public trust in the scientific endeavour, especially important for these storied substances. As public consumers of science, we should all be critical of new research and remember the <a href="https://www.wired.com/story/sagan-old-interview/">Sagan Standard</a>: “Extraordinary claims require extraordinary evidence.”</p><img src="https://counter.theconversation.com/content/101303/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>To know the real promise of psychedelic substances like LSD, mushrooms and MDMA, researchers must embrace the principles and practise of ‘open science.’Thomas Anderson, PhD student, University of TorontoRotem Petranker, PhD student in Clinical Psychology, York University, CanadaLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/944212018-04-09T20:05:37Z2018-04-09T20:05:37ZOur survey found ‘questionable research practices’ by ecologists and biologists – here’s what that means<figure><img src="https://images.theconversation.com/files/213332/original/file-20180405-189821-oqdb0h.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Negative results are still useful, and should not be hidden. </span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/closeup-old-dirty-school-blackboard-stains-1060705337?src=H_2XPJ_5Q3o7mSADUVReww-1-49">from www.shutterstock.com </a></span></figcaption></figure><p>Cherry picking or hiding results, excluding data to meet statistical thresholds and presenting unexpected findings as though they were predicted all along – these are just some of the “questionable research practices” implicated in the <a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">replication crisis</a> psychology and medicine have faced over the last half a decade or so.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/science-is-in-a-reproducibility-crisis-how-do-we-resolve-it-16998">Science is in a reproducibility crisis – how do we resolve it?</a>
</strong>
</em>
</p>
<hr>
<p>We recently surveyed more than 800 ecologists and evolutionary biologists and found high rates of many of these practices. We believe this to be first documentation of these behaviours in these fields of science.</p>
<p>Our pre-print <a href="https://osf.io/7qbfv/">results</a> have certain shock value, and their release attracted a lot of attention on social media.</p>
<ul>
<li><p>64% of surveyed researchers reported they had <em>at least once</em> failed to report results because they were not statistically significant (cherry picking)</p></li>
<li><p>42% had collected more data after inspecting whether results were statistically significant (a form of “<a href="https://theconversation.com/how-we-edit-science-part-2-significance-testing-p-hacking-and-peer-review-74547">p hacking</a>”)</p></li>
<li><p>51% reported an unexpected finding as though it had been hypothesised from the start (known as “HARKing”, or Hypothesising After Results are Known).</p></li>
</ul>
<p>Although these results are very similar to those that have been found in <a href="https://www.psychologicalscience.org/news/releases/questionable-research-practices-surprisingly-common.html">psychology</a>, reactions suggest that they are surprising – at least to some ecology and evolution researchers. </p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"976399658107928581"}"></div></p>
<p>There are many possible interpretations of our results. We expect there will also be many misconceptions about them and unjustified extrapolations. We talk though some of these below. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/how-we-edit-science-part-2-significance-testing-p-hacking-and-peer-review-74547">How we edit science part 2: significance testing, p-hacking and peer review</a>
</strong>
</em>
</p>
<hr>
<h2>It’s fraud!</h2>
<p>It’s not fraud. Scientific fraud involves fabricating data and carries <a href="https://theconversation.com/research-fraud-the-temptation-to-lie-and-the-challenges-of-regulation-58161">heavy criminal penalties</a>. The questionable research practices we focus on are by definition questionable: they sit in a grey area between acceptable practices and scientific misconduct.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/213342/original/file-20180405-189827-g9zda0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Not crazy. Not kooky. Scientists are just humans.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/crazy-chemistry-professor-injecting-lab-mouse-1017085096?src=52qpgUr9QmdeiZiAUN7-eA-3-51">from www.shutterstock.com</a></span>
</figcaption>
</figure>
<p>We did ask one question about fabricating data and the answer to that offered further evidence that it is very rare, <a href="https://theconversation.com/clearing-the-air-why-more-retractions-are-good-for-science-6008">consistent with findings from other fields</a>.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/research-fraud-the-temptation-to-lie-and-the-challenges-of-regulation-58161">Research fraud: the temptation to lie – and the challenges of regulation</a>
</strong>
</em>
</p>
<hr>
<h2>Scientists lack integrity and we shouldn’t trust them</h2>
<p>There are a few reasons why this should not be the take home message of our paper. </p>
<p>First, reactions to our results so far suggest an engaged, mature scientific community, ready to acknowledge and address these problems. </p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"976402383965179904"}"></div></p>
<p>If anything, this sort of engagement should increase our trust in these scientists and their commitment to research integrity.</p>
<p>Second, the results tell us much more about <a href="https://theconversation.com/publish-or-perish-culture-encourages-scientists-to-cut-corners-47692">structured incentives and institutions</a> than they tell us about individuals and their personal integrity. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/publish-or-perish-culture-encourages-scientists-to-cut-corners-47692">Publish or perish culture encourages scientists to cut corners</a>
</strong>
</em>
</p>
<hr>
<p>For example, these results tell us about the institution of scientific publishing, where negative (non statistically significant results) are all but banished from most journals in most fields of science, and where replication studies are virtually never published because of relentless focus on novel, “ground breaking” results. </p>
<p>The survey results tells us about scientific funding, again where “<a href="https://theconversation.com/novelty-in-science-real-necessity-or-distracting-obsession-84032">novel</a>” (meaning positive, significant) findings are valued more than careful, cautious procedures and replication. They also tell us about universities, about the hiring and promotion practices within academic science that focus on publication metrics and overvalue quantity at the expense of quality. </p>
<p>So what do they mean, these questionable research practices admitted by the scientists in our survey? We think they’re best understood as the inevitable outcome of publication bias, funding protocols and an ever increasing pressure to publish.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/novelty-in-science-real-necessity-or-distracting-obsession-84032">Novelty in science – real necessity or distracting obsession?</a>
</strong>
</em>
</p>
<hr>
<h2>We can’t base important decisions on current scientific evidence</h2>
<p>There’s a risk our results will feed into a view that our science is not policy ready. In many areas, such as health and the environment, this could be very damaging, even disastrous. </p>
<p>One reason it’s unwarranted is that climate science is a model based science, and there have been many independent replications of these models. Similarly with immunisation trials. </p>
<p>We know that any criticism of scientific practice runs a risk in the context of <a href="https://theconversation.com/who-are-you-calling-anti-science-how-science-serves-social-and-political-agendas-74755">anti-science sentiment</a>, but such criticism is fundamental to the success of science. </p>
<p>Remaining open to criticism is science’s most powerful self-correction mechanism, and ultimately what makes the scientific evidence base trustworthy.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/213528/original/file-20180406-125184-lskp91.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Transparency can build trust in science and scientists.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/story-psychologists-office-778256944?src=WvTBg66nmYGI5c9uW4gkaA-7-94">from www.shutterstock.com</a></span>
</figcaption>
</figure>
<h2>Scientists are human and we need safeguards</h2>
<p>This is an interpretation we wholeheartedly endorse. Scientists are human and subject to the same suite of cognitive biases – like <a href="https://theconversation.com/confirmation-bias-a-psychological-phenomenon-that-helps-explain-why-pundits-got-it-wrong-68781">confirmation bias</a> – as the rest of us.</p>
<p>As we learn more about cognitive biases and how best to mitigate them in different circumstances, we need to feed this back into the norms of scientific practice. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/confirmation-bias-a-psychological-phenomenon-that-helps-explain-why-pundits-got-it-wrong-68781">Confirmation bias: A psychological phenomenon that helps explain why pundits got it wrong</a>
</strong>
</em>
</p>
<hr>
<p>The same is true of our knowledge about how people function under different incentive structures and conditions. This is the basis of many of the initiatives designed to make science more open and transparent.</p>
<p>The <a href="https://cos.io/our-products/osf/">open science movement</a> is about developing <a href="https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198">initiatives</a> to protect against the influence of cognitive bias, and alter the incentive structures so that research using these questionable research practices stops being rewarded. </p>
<p>Some of these initiatives have been enthusiastically adopted by many scientists and journal editors. For example, many journals now publish analysis code and data along with their articles, and many have signed up to <a href="https://osf.io/9f6gx/">Transparency and Openness Promotion (TOP) guidelines</a>. </p>
<p>Other initiatives offer great promise too. For example, <a href="https://www.elsevier.com/reviewers-update/story/innovation-in-publishing/registered-reports-a-step-change-in-scientific-publishing">registered report</a> formats are now offered by some journals, mostly in psychology and medical fields. In a registered report, articles are reviewed on the strength of their underlying premise and approach, before data is collected. This removes the temptation to select only positive results or to apply different standards of rigour to negative results. In short, it thwarts publication bias.</p>
<p>We hope that by drawing attention to the prevalence of questionable research practices, our research will encourage support of these initiatives, and importantly, encourage institutions to support researchers in their own efforts to align their practice with their scientific values.</p><img src="https://counter.theconversation.com/content/94421/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Fiona Fidler receives funding from the ARC and IARPA. She is an ambassador for the Centre for Open Science.</span></em></p><p class="fine-print"><em><span>Hannah Fraser has received funding from the Australian Research Council and National Environmental Research Program. She is an open science ambassador associated with the Centre for Open Science. </span></em></p>Questionable research practices are not fraud, and they’re not cause for panic. But they do give us some hints about how we can make science more robust.Fiona Fidler, Associate Professor, School of Historical and Philosophical Studies, The University of MelbourneHannah Fraser, Postdoctoral Researcher , The University of MelbourneLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/932822018-03-15T05:12:31Z2018-03-15T05:12:31ZSpeaking with: Andrew Leigh on why we need more randomised trials in policy and law<figure><img src="https://images.theconversation.com/files/210870/original/file-20180317-104635-kisec6.jpeg?ixlib=rb-1.1.0&rect=0%2C0%2C1175%2C1177&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">AndrewLeigh.com</span>, <span class="license">Author provided</span></span></figcaption></figure><p><a href="https://theconversation.com/randomised-control-trials-what-makes-them-the-gold-standard-in-medical-research-78913">Randomised controlled trials</a> are the gold standard in medical research. Researchers divide participants into two groups using the equivalent of flipping a coin, with one group getting a new treatment and a control group getting either the standard treatment or a placebo. It’s the best way to prove that a new treatment works.</p>
<p>But the benefits of randomised trials aren’t limited to medical applications. Big businesses – like Amazon, Google, <a href="https://theconversation.com/facebook-will-continue-experimenting-on-users-under-closed-guidelines-32510">Facebook</a> and even media organisations – are increasingly <a href="https://theblog.okcupid.com/we-experiment-on-human-beings-5dd9fe280cd5">using randomised trials</a> to test designs and processes that increase their engagement with users and customers. Every time you Google something you’re probably participating in a randomised trial.</p>
<p>And that world of randomisation is the subject of Andrew Leigh’s new book, <a href="https://www.blackincbooks.com.au/books/randomistas">Randomistas: How radical researchers changed our world</a>. Leigh is the current federal member for Fenner, and Labor’s shadow assistant treasurer. But prior to his political life he was a professor of economics at Australian National University.</p>
<p>He spoke with the University of Melbourne’s Fiona Fidler about how we should be using randomised trials more to drive decisions and policy in public life and why we might be missing out on better results in social policy because we’re afraid to test our assertions.</p>
<hr>
<p><em>Andrew Leigh’s <a href="https://www.blackincbooks.com.au/books/randomistas">Randomistas: How radical researchers changed our world</a> is out now from Black Inc books. His podcast on living a health, happy and ethical life, The Good Life, is available on <a href="https://itunes.apple.com/au/podcast/the-good-life-andrew-leigh-in-conversation/id1147502226?mt=2">Apple Podcasts</a> or wherever you stream your podcasts.</em></p>
<p><em><a href="https://itunes.apple.com/au/podcast/speaking-with.../id934267338">Subscribe</a> to The Conversation’s Speaking With podcasts on Apple Podcasts, or <a href="http://tunein.com/radio/Speaking-with---The-Conversation-Podcast-p671452/">follow</a> on Tunein Radio.</em></p>
<p><strong>Music</strong></p>
<ul>
<li><a href="http://freemusicarchive.org/music/Blue_Dot_Sessions/The_Contessa/Wisteria">Free Music Archive: Blue Dot Sessions - Wisteria</a></li>
</ul><img src="https://counter.theconversation.com/content/93282/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Fiona Fidler receives funding from the Australian Research Council and IARPA.</span></em></p>Economist, author and MP Andrew Leigh spoke to Fiona Fidler about how we should be using randomised trials more to drive decisions and policy in public life.Fiona Fidler, Associate Professor, School of Historical and Philosophical Studies, The University of MelbourneLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/902862018-02-21T23:58:58Z2018-02-21T23:58:58ZWhy students are the answer to psychology’s replication crisis<figure><img src="https://images.theconversation.com/files/207382/original/file-20180221-132680-1hql3pa.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Shutterstock</span> </figcaption></figure><p>“Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital,” Aaron Levenstein, a <a href="http://www.nytimes.com/1986/07/05/obituaries/prof-aaron-levenstein.html">business professor at Baruch College</a>, once said. </p>
<p>I first heard a version of this quote in an undergraduate social psychology class in 2003. Nearly a decade and a half later, psychology is having a replication crisis — and the “bikini” is largely to blame. </p>
<p>Recently, more than 270 psychologists set out to <a href="http://science.sciencemag.org/content/349/6251/aac4716">repeat 100 experiments</a> to see if they could generate the same results. They successfully replicated only 39 of the 100 studies. </p>
<p>Over several years, <a href="http://www.slate.com/articles/health_and_science/cover_story/2016/03/ego_depletion_an_influential_theory_in_psychology_may_have_just_been_debunked.html">failed attempts to replicate published studies</a> have caused generally accepted bodies of research to be called into question — or rejected outright. </p>
<p>One example is the idea that <a href="https://hbr.org/2016/11/have-we-been-thinking-about-willpower-the-wrong-way-for-30-years">your willpower is a limited resource that, like a muscle, becomes exhausted</a> when it is used. Another is that power posing — <a href="http://journals.sagepub.com/doi/full/10.1177/0956797614553946">standing like a superhero for two minutes</a> — makes you feel bolder, reduces stress hormones and increases testosterone. Both have fallen aside due to failed replications. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=300&fit=crop&dpr=1 600w, https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=300&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=300&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=377&fit=crop&dpr=1 754w, https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=377&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/206867/original/file-20180219-75997-bvxnza.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=377&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Psychology was wrong about the power pose.</span>
<span class="attribution"><span class="source">(Shutterstock)</span></span>
</figcaption>
</figure>
<p>These aren’t dusty, arcane findings limited to academic journals; a TED talk by social psychologist Amy Cuddy on the effectiveness of power posing has been viewed over 45 million times and is <a href="https://www.ted.com/playlists/171/the_most_popular_talks_of_all">near the top of the list of the most popular TED talks of all time</a>. </p>
<h2>Bad habits</h2>
<p>The “bikini” at the centre of the crisis refers to the way researchers collect and analyze data and report their results. Many important details and decisions are often concealed.</p>
<p>When carrying out experiments, <a href="http://journals.sagepub.com/doi/abs/10.1177/0956797611417632">researchers make decisions</a> about how much data to collect, whether some observations should be excluded from the analysis and what controls, if any, should be included in analyses. </p>
<p>After the data has been collected, researchers have additional, undisclosed, leeway. </p>
<p>They may “torture the data” <a href="https://theconversation.com/one-reason-so-many-scientific-studies-may-be-wrong-66384">until it reaches statistical significance</a> (a cut-off that suggests the real effect may not be zero), a practice called “p-hacking.” </p>
<p>Or they may <a href="https://www.nature.com/articles/s41562-016-0021">engage in the practice of “HARKing,”</a> short for “hypothesizing after results are known.” Creating a hypothesis to confirm a result that has already been found makes it easier to satisfy journal reviewers and editors who are interested in publishing statistically significant results.</p>
<p>In academia, where researchers are often under pressure to “publish or perish” to advance their careers and win grants, amassing publications is the route to success. </p>
<p>All told, this undisclosed flexibility can lead to extremely high rates of <a href="http://journals.sagepub.com/doi/full/10.1177/0956797616658563">false positive results</a>. A false positive is essentially claiming there is an effect when there isn’t one. An example would be concluding that standing up straight increases testosterone levels, when it doesn’t. </p>
<h2>A new research culture</h2>
<p>Despite all the upheaval, psychology’s replication crisis may have a silver lining. In a few short years, researchers have proposed many ideas and recommendations for <a href="http://science.sciencemag.org/content/sci/348/6242/1422.full.pdf">reforming research with the goal of improvement</a>. </p>
<p>Journals and granting agencies are demanding more from authors with respect to <a href="http://www.science.gc.ca/eic/site/063.nsf/eng/h_415B5097.html">openness</a> and <a href="https://www.psychologicalscience.org/publications/badges">transparency</a>. There are accessible online repositories, such as <a href="http://github.com">Github</a>, the <a href="http://osf.io">Open Science Framework</a> and <a href="http://opendoar.org">OpenDOAR</a>, that allow researchers to share their raw materials, exact protocols, scripts, data, code, etc. with anyone who has an internet connection. The aim is to essentially have nothing concealed in the scientific process.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=338&fit=crop&dpr=1 600w, https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=338&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=338&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=424&fit=crop&dpr=1 754w, https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=424&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/206869/original/file-20180219-76003-1oake2r.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=424&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Researchers who manipulate their data or engage in poor research practices will wind up with results that can’t be replicated.</span>
<span class="attribution"><span class="source">(Shutterstock)</span></span>
</figcaption>
</figure>
<p>Some journals, such as <a href="http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002456"><em>Psychological Science</em></a>, and recently <a href="http://www.apa.org/news/press/releases/2017/08/open-science.aspx">American Psychological Association journals</a> are encouraging authors to store their data and code in these repositories and to disclose details about data collection decisions before submitting a manuscript for peer review. Researchers can also preregister their hypotheses. But something has been missing. </p>
<h2>The missing link</h2>
<p>While psychological science has been moving toward more open and transparent methods, graduate student training has been largely left out of discussions. </p>
<p>Many of the practices that created the crisis are embedded in our research culture: We do things a certain way because we have always done things this way and other people do too. Much of this culture is assimilated when researchers are in graduate school. </p>
<p>To sustain and maintain the momentum of positive change, it is important for graduate education to keep up with changes in the field. If training fails to keep up, graduate students may leave programs with antiquated ideas and practices. <a href="http://rsos.royalsocietypublishing.org/content/3/9/160384">These ideas and practices can proliferate</a> as students become faculty members, start their own labs and train graduate students in the same manner they were taught.</p>
<p>Part of educating students is ensuring they are aware of the changing cultural landscape, and then explicitly teaching them to follow open and transparent research practices and avoid bad habits. </p>
<h2>Finding the light</h2>
<p>In our department at the University of Guelph, a group of methodologically minded faculty have recognized the importance of tackling this problem head on. Our goal is to create positive change and take steps to avoid history repeating itself with the next generation of researchers. </p>
<p>We created “<a href="https://www.uoguelph.ca/psychology/graduate/thesis-statistics">Statistical methods in theses: Guidelines and explanations</a>” to help students when conducting their thesis research. Students can work through the guidelines with their advisors, allowing them to make better decisions in the planning stages of their research projects.</p>
<p>The document’s rather humble sounding purpose belies an unintended provocative side. The guidelines identify questionable research practices — to provide explanations and advice for students who wish to follow open and transparent research practices. Because some of the questionable practices it identifies may be standard, previously unquestioned — and sometimes taught — procedures, the document has the potential to be viewed, by some, as extreme. </p>
<p>Culture is not something that can be changed overnight. But with explicit efforts to cultivate a new research culture, change can be targeted and purposeful. </p>
<p>This crisis in psychology makes me think about a line in John Milton’s epic poem, <em>Paradise Lost</em>: “Long is the way and hard, that out of Hell leads up to Light.” </p>
<p>By acting on the crisis, psychology has embarked upon its symbolic journey back to “light.” It will be current and future graduate students that will decide how, and where, the journey ends.</p><img src="https://counter.theconversation.com/content/90286/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Jeffrey R. Spence works at the University of Guelph. </span></em></p><p class="fine-print"><em><span>David Stanley is affiliated with University of Guelph. </span></em></p><p class="fine-print"><em><span>Ian Newby-Clark does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Bad research techniques have called into question the results of many psychology studies. Fixing the problem starts with making sure students don’t pick up bad habits.Jeffrey R. Spence, Associate Professor, University of GuelphDavid Stanley, Associate Professor, University of GuelphIan Newby-Clark, Professor, University of GuelphLicensed as Creative Commons – attribution, no derivatives.