Menu Close
An illustration of SARS-CoV-2, the virus that causes COVID-19.

Coronavirus origins: the debate flares up, but the evidence remains weak

Nearly three years since SARS-CoV-2 first emerged, we’re still not certain where the virus behind COVID-19 came from.

The location of the initial outbreak close to the Wuhan Institute of Virology drew suspicion that it may have been a lab leak. But scientists largely came out in favour of a natural spillover from bats to humans, through an intermediate animal host, at the Huanan seafood market located a few kilometres away. To date, though, no immediate ancestor of SARS-CoV-2 has been found in bats nor in any other animal that was on sale at the market.

A recent preprint (a study yet to be peer-reviewed) claims to have identified possibly unusual sequence patterns in the SARS-CoV-2 genome. These patterns may indicate the virus was genetically modified in a lab.

It should be emphasised that any realistic lab origin scenario would point to an accidental escape, and not to any nefarious intent. Viruses have no application as bioweapons in the modern world. They’re difficult to produce in large quantities and to deploy. They take days to be effective, and if capable of human-to-human transmission, they’re likely to spread to unintended populations, including friendly forces.

The preprint has been poorly received by most experts in the field, with many reacting to it on social media.

This mixed reception is largely unsurprising. Scientists and members of the wider public often hold strong opinions about the origin of SARS-CoV-2, despite all the available evidence remaining weak and circumstantial. In the absence of strong facts, opinions are bound to be largely based on emotions and group affiliation, particularly when the stakes are considered to be so high.

More about the science

The genomes of all organisms, including SARS-CoV-2, are formed of long stretches of four different nucleotides (A, T, G and C). These are the building blocks of RNA and DNA.

Large viral genomes, such as those of coronaviruses, can be cut into smaller pieces, or fragments, that can be mixed and matched to study the effect of different genes and mutations. Scientists might do this, for example, to understand which genes or mutations could increase the risk of a virus spilling over to humans.

The standard way to cut viral genomes into smaller pieces is with restriction enzymes, sometimes called molecular scissors. Restriction enzymes recognise and cut specific sequences of nucleotides (for example, GAATTC). Out of around 3,000 different restriction enzymes, only a fairly small number are commonly used to manipulate viral genomes. Among these are type IIS enzymes.


Read more: Why it will soon be too late to find out where the COVID-19 virus originated


The preprint claims that in the SARS-CoV-2 genome, the distribution of some restriction sites (the spots where the genome may have been cut and joined) is “anomalous” and compatible with the virus having been stitched together from multiple smaller fragments using type IIS enzymes called BsaI and BsmBI.

Notably, the restriction sites displayed an excess of silent mutations. These are nucleotide changes that don’t affect the characteristics of the virus and can be hallmarks of genetic engineering.

A twist

When cutting and stitching together genomes using IIS enzymes, scientists can seamlessly erase any footprints of restriction sites through a method called “golden gate assembly”.

So for the distribution of type IIS enzymes in SARS-CoV-2 to be interpreted as a signature of engineering, the IIS restriction sites would need to have been intentionally left in. Although not completely implausible, this isn’t standard practice, and scientists have questioned what the rationale would be for leaving these sites behind.

Questions have also been raised around some of the mathematical metrics on which the authors’ conclusions are based, in particular the presumed maximum length of the individual viral fragments. Meanwhile, the analysis has been criticised because it considered only the two type IIS restriction enzymes commonly used in this context.

All of these extremely technical points of contention illustrate the difficulty of formulating satisfying, testable hypotheses for complex questions.

A woman wearing a mask walks past the closed Huanan Seafood Wholesale Market in 2020.
Scientists largely believe SARS-CoV-2 crossed from animals to humans at the Huanan seafood market – but this isn’t settled. EPA-EFE

What are the chances?

The study also explored how easily the distribution pattern of restriction sites observed in SARS-CoV-2 could be generated by chance (as opposed to engineering). The researchers simulated a process of random mutations starting from two close relatives of SARS-CoV-2. The probability of generating the same pattern was low – 0.1% and 1.2%.

Again, this analysis has been criticised. Coronaviruses can naturally gain and lose restriction motifs by accumulating mutations, but also through different viral strains exchanging genetic material, a process called genetic recombination.

As coronaviruses undergo frequent genetic recombination, a simulation process using a mix of recombination and mutation events may arguably be better suited to address this question.

This criticism is fair, but partly overlooks the fact that unusual patterns can be informative even if the process that generated them remains unknown. A single black sheep in a flock of 1,000 stands out irrespective of whether its coat colour was caused by an unusual genetic makeup or because it fell in a barrel of tar.


Read more: Coronaviruses – a brief history


The evidence reported in the preprint is neither conclusive nor final. These findings may turn out to be a fluke, or generated by a flaw in the method. The authors have been largely open about some limitations of their work and have invited comments and criticism.

Even if the findings can be replicated by others, and stand up once additional data has been analysed, this study is unlikely to sway many opinions. At best – or at worst, depending on one’s prior belief – those results will just contribute a speck of additional weak, circumstantial evidence to the debate.

The reception of the work raises difficult questions. Some experts feel it’s unwise to discuss any evidence supporting a lab leak, as this may fuel conspiracy theories. Though, a public perception that existing evidence may be subjected to censorship is even more likely to have this effect. Notably, China has been largely uncooperative in investigations into the origin of the virus.

The nightmare scenario to me would not be the eventual confirmation of an accidental lab leak, but confirmation of a lab leak whose evidence has been aggressively suppressed.

Want to write?

Write an article and join a growing community of more than 181,000 academics and researchers from 4,921 institutions.

Register now