It’s a cruel disease which dramatically shortens life expectancy. One in 25 Europeans carry the cystic fibrosis gene and, in the UK, about 10,400 people currently have the condition. But people are living longer and longer thanks to improvements in screening at birth, early treatment and medication.
One of the most important things is for patients to be put on the medication that will most effectively inhibit the progression of the disease – there are a number of different options available. Understanding the progression of the disease over time – and how this is influenced by medication and environmental factors – is vital to improving patient prognosis. But different people respond differently to drugs, inhabit different environments and the timescale of progression is long.
So the question is, how can doctors predict which patient will respond best to which treatment? Unfortunately this remains quite challenging. Curiously though, we have developed a new technique based on astrophysical research, published in PLOS One, that may be a game changer.
The key to any research on the treatment of diseases is big data. In common diseases, such as cancer, high-quality datasets for many patients over time can be constructed within one administrative system – such as a regional or national health authority. These can be efficiently anonymised so that researchers can study the progression of individuals or groups of patients over time using anonymous identifiers.
The challenge with cystic fibrosis, as a rare disease, is how to assemble large samples of patients. Countries hold their own datasets of patients, but the sample sizes are small. The European Cystic Fibrosis Society Patient Registry, which today includes data from more than 42,000 people, was established to merge national data sets. But the data quality is not always uniform and – with the anonymisation required to share data – often the vital threads that linked the different data for a single patient over time were broken. Without anonymous identifiers it is impossible to carry out longitudinal studies following groups of patients over time.
Telescopes versus clinics
Working with my student – and then colleague – Peter Hurley, who has cystic fibrosis, we realised that at as astrophysicists familiar with analysing the data of hundreds of thousands of galaxies, we might have a potential solution. In astronomy, one of our challenges is to link a celestial object such as a distant galaxy in an image taken by one telescope with the same object in another image from a different telescope.
This is harder than you might think, as some telescopes are much lower resolution than others. A bright “blob” that you think is a galaxy in one image might overlap with many galaxies in a higher resolution image. Also, different galaxies have different colours, so may or may not be visible to different telescopes (they typically only detect specific wavelengths of light). We match galaxies using a computational framework which calculates the probability that a pair of celestial objects in two images are actually the same and compare this with an alternative hypothesis that they might be two different objects that just happen to be close.
For example, given an observed separation and the known positional errors of the objects in our catalogues we can calculate the probability that the true separation is zero (they are the same). Likewise, knowing the density of galaxies, we can calculate how likely it is that one would happen, by chance, to lie within a circle with radius of that observed separation.
We therefore hoped that by using the same techniques we might be able to link different records that belong to the same patient. The obvious way to link objects in two astronomical catalogues is by position. One object would be linked to an object at the same position in the second catalogue. By analogy we could link two records in a patient registry if they had the same age, gender, genetic or other characteristics.
But this is often not enough. Two different galaxies can have similar positions and two different patients can have the same age, gender and genotype. To refine the matches, we need to consider more characteristics. But these may vary – for example, a galaxy’s brightness may be different depending on which telescope measures it. And a patient’s weight may change from one clinic visit to another. In astronomy we solve this by having a model of how galaxy light varies as seen through different telescopes.
A key factor of our method was that we included an analogous model for how patients’ body mass index was expected to vary over time. Body mass index appears in every patient record and we know how it varies with age and progression of cystic fibrosis.
New answer book
Working with Anil Mehta at the University of Dundee, who had been instrumental in establishing the European Cystic Fibrosis Society Patient registry, we tested the algorithm on a subset of the data that came from Denmark. This was believed to the best data set – the records belonging to any one patient could be reliably linked through an anonymous identifier that was provided. This meant we had an answer book.
We matched the records blindly without using this identifier and then checked our answers. We were reasonably happy with our results. However, on closer inspection, it transpired that many of the cases where we appeared to be wrong were actually clerical errors arising in the manual entry of the anonymous patient identifiers into the Danish database. So, our matches were correct but the answer book was wrong. That meant that in the European Cystic Fibrosis Society Patient Registry, thanks to the algorithm derived from astrophysics, there is a complete and accurate version of the Danish dataset showing the progress of patients year on year which can be matched against their medication or any other factor desired.
We published our method in a paper so that others working on patient registry databases, particularly those for rare diseases, could apply this technique to join records for patients where they don’t already exist or to check and clean their data sets. We hope to do further work with the European Cystic Fibrosis Society Patient Registry to provide links for all their records, and are awaiting for approval for this.
Hopefully this will offer some hope to people with cystic fibrosis and other rare diseases that researchers may soon have a better chance to find factors and medicines that can improve their quality of life.