Sharing is caring: we need open access to genetic information

Technical, financial and legal barriers stop the sharing of vital information in medical research. Frans de Waal/ Wikimedia Commons

A paper published today in Science Translational Medicine calls for the open sharing of clinical trial data among the medical research community. Researchers argue data sharing would lead to faster, more trustworthy evidence for many pressing health problems.

But data sharing is not just important for clinical trials. The free and open sharing of information on genetic variations and their effects on patients is the only way that we can provide any semblance of genetic health care.

Last week was the ninth anniversary of the completion of the Human Genome Project. Almost a decade on, how are we placed to make good on the many promises that were made to justify this unprecedented human endeavour?

In total, the Human Genome Project cost US$3 billion dollars and took over a decade. It involved thousands of scientists from institutions all over the world working in massive sequencing factories.

Today, a complete human genome can be sequenced in a few days for a few thousand dollars on a single machine.

Unfortunately, as this story on The Conversation recently pointed out, the ability to quickly and cheaply sequence DNA is not enough. The shortcoming we now face is our inability to understand the clinical significance of the many genetic changes we can uncover.

The task of interpreting the effect of any DNA change is the job of molecular pathologists and medical geneticists, who have a number of methods at their disposal. But the most effective way is to demonstrate that other patients with the same genetic variation develop the same condition.

This kind of information can only come from other pathologists and medical geneticists. But, in the vast majority of cases, it’s not being shared effectively.

The traditional method of sharing scientific information, the scientific paper, doesn’t measure up to this task. Researchers can’t write papers fast enough to keep pace.

Journals are only willing to devote page space to the most interesting cases. And even if enough papers could be published, there wouldn’t be enough time to read them all. Free text papers also don’t easily support the discovery and the re-use of information.

These shortcomings have stimulated the growth of on-line genetic variation databases. Databases are seen as a better choice because they are scalable, the information is structured and can more easily be searched.

These databases also provide a review function for the information. As new information is added to the database, it is checked by the database curator for accuracy and consistency. This is important: recent research has found that over 12% of sequence changes classified as “disease causing” in the scientific literature are incorrect. In my own research, I have come across reports that the error rate in the literature for some genes is as high as 40%.

Too few databases exist to cover all 23,000 genes in the genome and none of those that do exist can be considered complete. And while it’s true that we don’t know everything about our genes and how they cause disease, this is not why these databases are incomplete. They are incomplete because we aren’t sharing information.

The difficulties facing sharing fall into three categories.

The first is purely technical: how to structure the data; how to transfer it effectively; finding the time to upload it; and similar issues in the same vein. While these challenges are, well, challenging, they can be overcome with a bit of effort.

Tools already exist that can integrate with existing lab or clinic information systems and mediate transfer to international data repositories.

The next difficulty is financial. Setting up and maintaining these databases, while not expensive, does require money. Funding is often difficult to obtain as databases don’t easily fit into traditional funding models. They’re not quite research and not quite clinical practice. They lie somewhere between the two.

The last is the most challenging of the three: legal barriers. From a clinical perspective, the most useful information to have on hand is patient-level information. That’s information on specific individuals with a particular genetic variant and how it affected them. The sharing of such information is difficult to achieve without breaching privacy legislation.

Even stripping the data of all identifying information before any sharing takes place may not make sharing possible. The only way to ensure that information can be shared is to gain informed consent from the patient. But this is a time-consuming process that’s usually not at the forefront of the treating clinician’s mind.

Despite these restraints, sharing of this vital information does occur on quite a large scale. Efforts are also underway to reduce the obstacles of sharing.

The biggest among them is the Human Variome Project, which was initiated in Melbourne in 2006. The Human Variome Project is an international pool of scientists and health-care professionals working to ensure that all information on genetic variation can be collected, curated, interpreted and shared freely and openly.

The Project establishes and maintains standards, systems and infrastructure for data sharing; provides education and training to scientists, doctors and the general public; and helps individual countries build their capacity in medical genetics.

More than ever before, we’re living in a world where our individual genetic makeup will determine the course of the medical treatment we may undergo.

We may all be the eventual recipient of possibly life-altering medical intervention that’s based on the insights collected from someone else’s unique genetic sequence. Without the free and open sharing of information on genetic variations, we may be withholding treatment from people who are already suffering.