Menu Close
A new statistical test lets scientists figure out if two groups are similar to one another. paleontologist natural/shutterstock.com

The equivalence test: A new way for scientists to tackle so-called negative results

A paleontologist returns to her lab from a summer dig and sets up a study comparing tooth length in two dinosaur species. She and her team work meticulously to avoid biasing their results. They remain blind to the species while measuring, the sample sizes are large, and the data collection and the analysis are rigorous.

The scientist is surprised to find no significant difference in canine tooth length between the two species. She realizes that these unexpected results are important and sends a paper off to the appropriate journals. But journal after journal rejects the paper, since the results aren’t significantly different. Eventually, the scientist gives up, and the paper with its so-called negative results is placed in a drawer and buried under years of other work.

This scenario and many others like it have played out across all scientific disciplines, leading to what has been dubbed “the file drawer problem.” Research journals and funding agencies are often biased toward research that shows “positive” or significantly different results. This unfortunate bias contributes to many other issues in the scientific process, such as confirmation bias, in which data are interpreted incorrectly to support a desired outcome.

A new method: Equivalence

Unfortunately, publication bias issues have been prevalent in science for a long time. Due to the structure of the scientific method, scientists often focus only on differences between groups – like the dinosaur teeth from two different species, or a public health comparison of two different neighborhoods. This leaves studies that focus on similarities completely hidden.

However, pharmaceutical trials have found a solution for this problem. In these trials, researchers sometimes use a test known as TOST, two one sided test, to look for equivalence between treatments.

For example, say a company develops a generic drug that is cheaper to produce than the name-brand drug. Researchers need to demonstrate that the new drug functions in a statistically equivalent manner to the name brand before selling it on the market. That’s where equivalence testing comes in. If the test shows equivalence between the effects of the two drugs, then the FDA can approve the new drug’s release on the market.

While traditional equivalence testing is very helpful for preplanned and controlled pharmaceutical tests, it isn’t versatile enough for other types of studies. The original TOST cannot be used to test equivalence in experiments where the same individuals are in multiple treatment groups, nor does it work if the two tests groups have different sample sizes.

Additionally, the TOST used in pharmaceutical testing does not typically address multiple variables simultaneously. For example, a traditional TOST would be able to analyze similarities in biodiversity at several river locations before and after a temperature change. However, our new TOST would allow to test for similarities in multiple variables – such as biodiversity, water pH, water depth and water clarity – at all of the river sites simultaneously.

The limitations of the traditional TOST and the pervasiveness of the “file drawer problem” led our team to develop a multivariate equivalence test, capable of addressing similarities in systems with repeated measures and unequal sample sizes.

Our new equivalence test, published in October, flips the traditional null hypothesis framework on its head. Now, rather than assuming similarity, a researcher starts with the assumption that the two groups are different. The burden of proof now lies with evaluating the degree of similarity, rather than the degree of difference.

Our test also allows researchers to set their own acceptable margin for declaring similarity. For example, if margin were set to 0.02, then the results would tell you if the means of the two groups were similar within plus or minus 2 percent.

A step in the right direction

Our modification means that equivalence testing can now be applied across a wide range of disciplines. For example, we used this test to demonstrate equivalent acoustic structure in the songs of male and female eastern bluebirds. Equivalence testing has also already been used in some areas of engineering and psychology.

The method could be applied even more broadly. Imagine a group of researchers who want to examine two different teaching methods. In one classroom there is no technology, and in another all of the students’ assignments are done online. Equivalence testing might help an school district decide if they should invest more in technology or if the two methods of teaching are equivalent.

The development of a broadly applicable equivalence test represents what we think will be a huge step forward in scientists’ long struggle to present real and unbiased results. This test provides another avenue for exploration and allows researchers to examine and publish the results from studies on similarities that have not been published or funded in the past.

The prevalence of publication bias, including the file drawer problem, confirmation bias and accidental false positives, is a major stumbling block for scientific progress. In some fields of research, up to half of results are missing from the published literature.

Equivalence testing provides another tool in the toolbox for scientists to present “positive” results. If the scientific community takes hold of this test and utilizes it to its full potential, we think it may help mitigate one of the major limitations in the way science is currently practiced.

This article has been updated to correct the margin of declaring similarity.

Want to write?

Write an article and join a growing community of more than 191,200 academics and researchers from 5,061 institutions.

Register now