W9snc66b 1501835383

DNA sequencing and big data open a new frontier in the hunt for new viruses


DNA sequencing and big data open a new frontier in the hunt for new viruses


Discovering new viruses has historically been biased towards people and animals that exhibit symptoms of disease – like a cough, fever or skin blister.

But there are two challenges to discovering viruses this way. The first is that it’s only the beginning of a long and painstaking process of identifying the infectious virus that caused it. The second more important reason is that it leaves out potentially deadly viruses that might emerge as a result of transmission from other species where they don’t show any clear signs of infection.

HIV or Ebola are examples of this. Both cause humans harm but go unnoticed in their native hosts – monkeys and chimps in the case of HIV and possibly bats for Ebola.

One possible solution to this is the use of DNA sequencing. The advent of recent technologies has made sequencing DNA faster and cheaper. It has meant that public genome databases now have petabytes – a petabyte is a million gigabytes - of DNA sequence data available from hundreds of species. This so-called next generation sequencing has also given birth to metagenomics, where scientists can scoop up a handful of dirt (or anything else), and sequence everything in it.

This means that scientists can now discover new viruses using DNA sequencing. The magic is that they don’t even have to know what the virus looks like, or if it causes disease. This level of granularity means that scientists can detect miniscule amounts of a virus’s DNA that happens to be in the blood or tissue sample of a host.

Many scientists sequencing the DNA of animals or plants view this viral DNA as a nuisance – rogue contaminants that need to be filtered from DNA sequencing results.

But we took a different view. What if the viral DNA was a missed opportunity? So we set out to test the idea that massive online DNA databases could be used as a resource to discover viruses – even if the data had not been explicitly collected for that purpose.

In our study we examined 50 genomes of fish and uncovered viral DNA in 15 fish species, including Atlantic salmon and rainbow trout. We did this by applying a data mining approach. Our aim was to identify novel members of a group of herpesviruses – alloherpesviruses – that infect fish.

Our findings have opened up entire new frontiers of understanding about these viruses in fish. To date only a handful have been identified despite the massive growth of the aquaculture industry and potential hazards that emerging infectious disease would cause. This includes economic loss, threat to food supply chains and danger to fish in the wild.

It is yet to be seen if any of the viral sequences identified in 15 different fish are capable of forming infectious viral particles, or if they cause disease. But it’s a start. A major advantage of already knowing the genome sequence of these potential pathogens is that they can be used to help identify the cause of disease.

New families of viruses

The key to our approach was to combine evolutionary biology with techniques that are used to analyse huge quantities of DNA sequence data. This strategy has emerged from the new field of paleovirology – the study of viruses that have integrated into the DNA of their hosts, sometimes millions of years ago.

We first recognised the potential of using this approach in a study we did in 2014. We went looking for ancient viruses in the Philippine Tarsier and found two modern viruses as well as the ancient herpesvirus. We realised that the technique could be applied to discovering new viruses.

In our latest study we built on the technique to look for novel alloherpesviruses. In fact, we found more than we bargained for. Instead of novel alloherpesviruses, we uncovered a lineage of unusual viruses that may even be a new viral family related to alloherpesviruses.

To confirm that this was not simply a lab contaminant or data processing error, we visited a local supermarket and sushi house for extra samples. Lab work confirmed that fragments are also present in commercial samples.

Identifying the disease

Using this approach to identify novel viruses is not yet common practice. But our study demonstrates the value of the data we already have.

There are still gaps in our knowledge: for example, are any of the viral sequences identified in 15 different fish capable of forming infectious viral particles? And do they cause disease?

Even though we can’t yet answer these questions, knowing about the viruses is useful for two reasons: Firstly, we now know about a range of new viruses which could prove useful to the fishing industry. And secondly, while an infectious virus may not even cause disease in its natural host fish, there is a risk of cross-species transmission to other species. As the farmed fishing industry continues to grow there is real possibility of transfer to either other farmed fish or wild populations. The risk of transmitting to humans, however, is far lower.

Beyond this study, we can hunt for novel viruses in a range of different species. One strategy might be to start with culprits that we already know harbour transmissible disease. Bats and rodents, for example, are notorious for being reservoirs of infectious disease that they are seemingly immune to. Insects such as mosquitoes are also carriers of viral diseases (such as Zika) that harm humans.

This development now gives us the scope to apply our approach to uncover other viruses before the next outbreak even happens.