Menu Close

Genetic detectives: how scientists use DNA to track disease outbreaks

An epidemiologist examines the sample taken from a patient thought to be infected with influenza A (H1N1) virus. Maria Armas/Reuters

They’re the top questions on everyone’s mind when a new disease outbreak happens: where did the virus come from? When did this happen? How long has it been spreading in a particular country or group of people?

These questions have been the foundation of epidemiology, the study of the occurrence and spread of disease, since the days when outbreaks were tracked by hundreds and hundreds of questionnaires linking people with similar symptoms.

John Snow’s map tracking cholera cases to their source. By John Snow, via Wikimedia Commons

John Snow, widely regarded as the one of the first epidemiologists (and somewhat of a folk hero in scientific circles), conducted one of the first known epidemiologic investigations during a London cholera outbreak in 1854. He went door to door, mapping cases of illness, and ultimately identified a water pump at the center of the outbreak.

These same fundamentals of so-called “shoe-leather epidemiology” are used every day by health departments, government agencies and research teams around the world to identify and track outbreaks. Thanks to improvements in the speed and cost of DNA analysis, these old methods are increasingly being paired with genomic technology.

Today, genetic sequencing allows us to determine how an infection travels – even tracing it across continents – with incredible precision.

The molecular clock – a stopwatch for infection

Viruses and bacteria contain DNA and RNA, which means they can evolve. As viruses and bacteria make copies of themselves, their molecular material changes. This is because the enzyme that copies DNA and RNA makes random errors as the virus or bacteria replicates.

This evolution is akin to the development of mammals over evolutionary history, but with an important difference. The lifespan of a bacterium or virus is short, and they replicate quickly in astonishing quantities. This means we can observe evolutionary change in as short a span as just a few hours or days.

This constant change is called a molecular clock. Once an infection is transmitted to a new victim, starting a new branch on that infection’s genetic tree, the clock starts anew and continues to tick until the victim’s body defeats the infection or until the infection kills the victim.

Tick tock. Stopwatch image via

We can observe this change directly by sequencing infections in different people and comparing how similar or different they are. This is work that is done by my laboratory and many others around the world. We assume that infections with similar sequences come from the same location at the same time, giving clues into when an infection entered a particular area, or how an infection traveled from one group of people to another.

For viruses with a very high mutation rate, the detective work can get more complicated. A single person’s infection will evolve to the point where he or she has many different versions of the pathogen in the body, and only one copy, with one genetic version, may actually infect another person. Investigating these types of viruses requires the latest technology in whole-genome sequencing technology and bioinformatics, the computational analysis required to interpret large amounts of sequence data.

Tracing outbreaks both old and new

Genetic analysis has lead scientists to hypothesize that Zika probably entered Brazil during a 2014 international canoe competition, likely carried by a person from French Polynesia or another Pacific Island. Genetic analysis also pinpointed that cholera was introduced into Haiti by peacekeepers after the 2010 earthquake, linked through sequencing data. Tracing how an infection moves into and through a country or a group of people helps public health officials determine what interventions may work to prevent future spread.

Genomics have even busted myths about disease, such as the oft-repeated story of Patient Zero, the man who purportedly introduced HIV into the United States in the 1980s. Molecular clock calculations have shown this scenario to be incorrect. It turns out HIV had been circulating in the United States since the late 1960s, over a decade before the Centers for Disease Control and Prevention (CDC) identified the first AIDS cases.

One of my favorite examples of the power of genetic detective work is the studies of the measles virus before and after vaccination largely eliminated the disease in the United States. Before the measles vaccine was developed, the virus was common, spreading from person to person within the U.S. After the introduction of the vaccine in 1963, the virus largely disappeared, with the occasional resurgence in areas where vaccination rates are low.

The genetics of the virus tell us that U.S. measles cases after 1963 are mostly due to people with infections traveling to the U.S. from other countries, rather than spreading from person to person within the U.S.

A vial of measles, mumps and rubella vaccine and an information sheet is seen at Boston Children’s Hospital in Boston, Massachusetts, February 26, 2015. Brian Snyder/Reuters

When sequencing technology was first available, it was a long and expensive process that could clarify outbreaks only in hindsight. Now, it is inexpensive and fast enough to use while an outbreak is ongoing. During the measles outbreak that began at Disneyland in California in December 2014, the CDC was able to quickly determine that it started with a strain similar to measles cases in the Philippines, where measles outbreaks are more common.

We can even use genomics on a case-by-case basis to tell whether one person infected another, because infections on either side of a transmission event will be more similar to each other than to unrelated samples.

This technology has already been useful in untangling the spread of hospital-associated infections, which occur within notoriously complex webs of connections and risk factors. Whole-genome sequencing allows researchers to trace these infections as they move from one patient to another. Hopefully future developments will allow us to identify and interrupt these chains of transmission in real time.

We still need old-fashioned shoe-leather epidemiology

Even with all of the advances in sequencing over the past decade, our ability to conduct genetic investigations is only as good as our public health infrastructure. The most advanced technologies are useless without ongoing disease surveillance.

A trained, funded and sustainable public health workforce must be in place to identify outbreaks early, collect samples and respond quickly to interrupt transmission. Molecular epidemiology works only to the extent that samples are collected for researchers to use to compare and contrast sequences.

When outbreaks are not identified early, or when the right samples aren’t collected, the investigation will be unable to find links between people in the outbreak and the source of the infection. We need both cutting-edge genetic technology and centuries-old epidemiologic methods to continue to work to stop the spread of infectious diseases.

Want to write?

Write an article and join a growing community of more than 186,800 academics and researchers from 4,994 institutions.

Register now