The genomes of the recent German E. coli outbreak have revealed crucial insights into the origins of this deadly strain. The bacteria was found in German bean sprouts but it didn’t originate from the gut of an animal, as first suspected – it came from humans.
Humans are engaged in a never-ending conflict with bacteria, viruses and other microbes that make us ill. New genes such as toxins can make a bacteria more virulent and cause more serious illness, like we saw in the German outbreak.
Our weapons against these microbes include drugs, vaccines, sanitation and hygiene. But the ability to share genetic information is the number one weapon in the bacterial arsenal.
Understanding bacteria through their DNA
We often hear about human genetics and the human genome project, but bacterial genomics – the study of bacteria through their complete DNA sequence – has been around for a lot longer.
Bacterial genome sequencing gives us a complete picture of all the genes the bacterium has at its disposal, which can have a significant impact on human health.
It helps us to track the bacteria’s movements and to identify its weaknesses, allowing us to design new drugs and vaccines.
Much like DNA fingerprinting of humans links individuals to a crime scene, we can use DNA fingerprinting of bacteria to link potential sources of bacteria to specific infections and outbreaks.
By identifying the genes encoded in a particular bacterium’s DNA, we can predict how dangerous it is to human health, how it might spread and what drugs it is likely to be resistant to.
Since bacteria share DNA so readily, we can also examine where the various bits of DNA have come from, which can give us clues about where the bacterium originated.
Information-sharing in the German E. coli investigation
The ongoing German E. coli outbreak is the first time the technology has been available to study the cause of a foodborne outbreak at the whole genome level.
Rather than having multiple scientists working in silos to come up with their own analysis, experts around the globe shared their data online.
On June 2, BGI (formerly Beijing Genome Institute) in China released the first genome sequence data from a strain associated with the German outbreak.
Within days, genome data from two additional strains was released, and as of June 23 there are nine genomes available.
All of this data was released to the public and deposited in publicly accessible databases, prompting a flurry of analysis by bacterial genomicists all over the world.
Within hours, scientists began posting analysis of the outbreak genomes, providing novel insights into the genome of the outbreak strain.
These analysts include many who, like myself, don’t work for any agency with responsibility for outbreak control. This public sharing of results and analyses is called “crowd-sourcing”.
In the German E. Coli outbreak, crowd-sourcing allowed scientists to identify the source of the toxins and the antibiotic resistant genes that are problematic for human health.
Lessons from the outbreak
The key puzzle in the German E. coli outbreak strain was its unusual combination of genes.
It carried genes to encode Shiga-toxin, which makes it much nastier than a regular diarrhoeal infection. But it didn’t carry other genes typically associated with the symptoms it was causing (genes from enterohaemorrhagic E. coli, or EHEC).
Instead, it carried some genes from enteroagreggative E. coli, or EAEC, which usually causes short bouts of diarrhoea in children.
These genes are associated with “aggregative adhesion”, which makes the bacterium able to stick easily to human and other cells.
But it was unclear whether the outbreak strain was an EHEC that had picked up some aggregative adhesion genes, or an EAEC which had picked up some EHEC genes.
It was even suggested that this was a completely new bacterium, unlike anything seen before.
The crowd-sourced, publicly available analysis of the outbreak strain’s DNA showed it was in fact an enteroagreggative E. coli (EAEC), which had acquired the Shiga-toxin via a phage (viruses that can integrate their own DNA and toxins into the bacterial genome).
In fact it is nearly identical to the genome of an E. coli isolated from a child with diarrhoea a decade ago, except that the new outbreak strain has acquired the toxin. And worryingly, it can resist whole swathes of antibiotics.
While EHEC is common in animals (with outbreaks often traced back to cows), EAEC has only ever been found in humans. This outbreak has been linked to contamination of the food chain with human feces rather than cow feces. So hygiene, rather than agricultural practice, is the likely culprit.
But how the sprouts became contaminated is still a mystery. Several of the farm’s workers were sick with the outbreak strain, but it’s difficult to tell whether they became infected at work or if one of them could be the initial source of the contamination.
The way forward
Crowd-sourcing analysis holds great promise for the future understanding of infectious diseases and outbreak investigation.
It’s interesting that while holes have been identified in the sharing of outbreak information across agencies – even within the European Union – the sharing of DNA sequence data was able to happen so rapidly and openly.
But this kind of open analysis raises other issues around ownership of sequence data, and in particular, around authorship of research articles based on the analysis.
The data issues have been clarified by BGI which has released its data under a “Creative Commons 0” license, enabling completely free use and distribution without attribution.
But the open publication of analysis via blogs and wikis may affect the ability of scientists to publish their more complete analyses later in peer-reviewed scientific journals.
If we are to win the war against infectious disease outbreaks, the genomics and public health communities still have some lessons to learn from the bugs about the free flow of information.