tag:theconversation.com,2011:/us/topics/google-flu-trends-7632/articlesGoogle Flu Trends – The Conversation2015-03-04T10:54:01Ztag:theconversation.com,2011:article/377412015-03-04T10:54:01Z2015-03-04T10:54:01ZDigital epidemiology: tracking diseases in the mobile age<figure><img src="https://images.theconversation.com/files/73502/original/image-20150302-15965-9tr044.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">It must be the flu, right?</span> <span class="attribution"><a class="source" href="http://www.shutterstock.com/s/flu+computers/search.html?page=2&thumb_size=mosaic&inline=232546609">Woman via Shutterstock</a></span></figcaption></figure><p>Being stuck in bed, waiting for the flu to run its course, is pretty unpleasant. And it’s also really boring. What else is there to do but search for symptoms online, and read entries about the flu on Wikipedia or WebMD or post messages on Facebook and Twitter about how sick you are?</p>
<p>A lot of people get the flu every year and many of them do exactly that: they search for relevant information, and share their misery with the rest of us. The consequence is remarkable: a description of their symptoms, time-stamped and perhaps even geo-tagged, is online. Which means that the internet has a rather detailed picture of the health of the population, coming from digital sources, through all of our connected devices, including smartphones.</p>
<p>This is <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002616">digital epidemiology</a>: the idea that the health of a population can be assessed through digital traces, in real time. </p>
<p>It has the potential to be a powerful boon for traditional epidemiology. Researchers have already started to develop methods and strategies for using digital epidemiology to support infectious disease monitoring and surveillance or understand attitudes and concerns about infectious diseases. But much more needs to be done to integrate digital epidemiology with existing practices, and to address ethical concerns about privacy. By 2020, there will be <a href="http://www.ericsson.com/news/1872291">6.1 billion smartphone users</a>, so it is high time to get serious about digital epidemiology.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=306&fit=crop&dpr=1 600w, https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=306&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=306&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=384&fit=crop&dpr=1 754w, https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=384&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/73685/original/image-20150303-31863-1s9lk8m.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=384&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Projected smartphone subscriptions 2014 to 2020.</span>
<span class="attribution"><a class="source" href="https://www.flickr.com/photos/ericsson_images/16009981084/in/set-72157650997471052">Ericsson</a>, <a class="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">CC BY-NC-ND</a></span>
</figcaption>
</figure>
<h2>Digital epidemiology goes mainstream: Google flu trends</h2>
<p><a href="https://www.google.org/flutrends/us/">Google Flu Trends</a> was one of the first popular examples of digital epidemiology. Launched in 2008 to help predict flu epidemics, it was based on a very simple idea: when people come down with the flu, they will often turn to the internet and search for information about their symptoms.</p>
<p>In 2009, researchers from Google and the US Centers for Disease Control and Prevention (CDC) published a paper with the apt title “<a href="http://static.googleusercontent.com/media/research.google.com/en/us/archive/papers/detecting-influenza-epidemics.pdf">Detecting influenza epidemics using search engine query data</a>,” outlining a method for using search queries to recognize flu outbreaks. </p>
<p>For many years, Google Flu Trends has served as a prime example of digital epidemiology. It embodies both the opportunities and the challenges the field faces. While it has undoubtedly popularized the idea of using digital data to derive epidemiological insights, Google Flu Trends has also demonstrated that this is no easy task. </p>
<iframe width="100%" height="250" src="https://www.google.org/flutrends/embed/en_us/us/#US" frameborder="0"></iframe>
<p>For starters, <a href="http://dash.harvard.edu/bitstream/handle/1/12016836/The%20Parable%20of%20Google%20Flu%20%28WP-Final%29.pdf">its estimates were not always very accurate</a>. Indeed, during the 2012-2013 flu season in the northern hemisphere, it overestimated the flu prevalence by up to 100% (relative to CDC numbers). And the estimates cannot be reproduced easily – Google controls access to Google data, of course. </p>
<p>For this reason alone, many researchers have in the past few years turned to alternative data sources. Twitter has been a particularly popular source, because tweets are public by default, and because Twitter data can be accessed by anyone. </p>
<h2>Twitter and Wikipedia are becoming data sources for digital epidemiology</h2>
<p>For instance, a study from 2011 used data from Twitter to measure public interest and concern about the influenza H1N1 virus and <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019467">to track disease activity</a>. Another study from 2014 showed that incorporating data from Twitter into CDC influenza-like illness models can <a href="http://currents.plos.org/outbreaks/article/twitter-improves-influenza-forecasting/">reduce forecasting errors</a>. Twitter has also been used to <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002199">assess health sentiments</a> such as those about vaccination, and to <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4013443/">monitor drug safety</a>.</p>
<p>And Wikipedia access logs – open accessible data about how often certain Wikipedia pages were accessed on the web – have recently provided a rich data source for <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003892">disease monitoring and forecasting</a>. Research suggests that examining Wikipedia access logs could support traditional disease surveillance for <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003581">influenza</a>. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=451&fit=crop&dpr=1 600w, https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=451&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=451&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=567&fit=crop&dpr=1 754w, https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=567&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/73499/original/image-20150302-15950-reom2w.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=567&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">A doctor uses a smartphone to conduct an eye exam in Kenya, October 29, 2013.</span>
<span class="attribution"><span class="source">Noor Khamis/Reuters</span></span>
</figcaption>
</figure>
<h2>The doctor is in your pocket: epidemiology goes mobile</h2>
<p>But it’s not just publicly accessible data from Twitter and Wikipedia that have been harnessed for epidemiology. Anonymized mobile phone data have provided unparalleled insights into how the movement of people affects disease dynamics. </p>
<p>For example, cell phone data have been used to measure how human travel patterns <a href="http://www.wellcome.ac.uk/News/2012/News/WTP040436.htm">spread malaria</a> and to rapidly <a href="http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001083">estimate population movements</a> during disasters and outbreaks, such as the earthquake and subsequent cholera outbreak in Haiti in 2010. </p>
<p>Apps that allow the self-diagnosing of diseases are not too far away. With the help of a small attachment, a smartphone can already be turned into a <a href="http://news.sciencemag.org/health/2015/02/lab-chip-turns-smart-phones-mobile-disease-clinics">mobile clinic</a> able to diagnose multiple infectious diseases in minutes. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=340&fit=crop&dpr=1 600w, https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=340&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=340&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=427&fit=crop&dpr=1 754w, https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=427&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/73691/original/image-20150303-31848-ncqev9.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=427&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Map generated by more than 250 million public tweets (collected from Twitter.com) with high-resolution location information, broadcast between March 2011 and January 2012.</span>
<span class="attribution"><a class="source" href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002616">Salathé et al.</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<h2>Traditional + digital = a better picture</h2>
<p>Public health is traditionally based on data collected from health-care providers, who collect data from sick patients. This produces a very patchy picture. It only includes those populations who have access to health care or who decide to go to the doctor in the first place. And it mostly includes information about <a href="http://www.nlm.nih.gov/medlineplus/ency/article/001929.htm">reportable diseases</a>, missing out on a huge array of other illnesses. Last but not least, it largely misses out on information about health behaviors, sentiments and opinions. </p>
<p>Digital epidemiology can add more information to that picture and fill in some of the blanks. Of course, digital epidemiology won’t capture the entire population. But, neither do traditional ways of gathering epidemiological data. With the vast majority of the world getting online, populations who slipped under the radar of public health will become more visible, which is crucial in a world where diseases anywhere today are diseases everywhere tomorrow. And it will also enable us to fulfill the mantra of <a href="http://www.ted.com/talks/larry_brilliant_wants_to_stop_pandemics">“early detection, early response”</a> by building digital warning systems designed to stop pandemics in their tracks.</p>
<h2>Don’t forget privacy and surveillance</h2>
<p>Digital epidemiology faces <a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003904">ethical challenges</a> about surveillance and privacy as well. Ill health is stigmatized – socially and economically – in all societies. And people are <a href="http://www.pewinternet.org/2014/11/12/public-privacy-perceptions/">more and more concerned</a> about surveillance and information privacy. As digital epidemiology grows, we need to keep these ethical considerations at the forefront.</p><img src="https://counter.theconversation.com/content/37741/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Marcel Salathé does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Ever tweet about being sick? Or look up your symptoms online? Researchers are using this information to monitor illnesses and attitudes about health in real time.Marcel Salathé, Assistant Professor of Biology and Adjunct Faculty of Computer Science and Engineering, Penn StateLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/285292014-06-27T14:24:32Z2014-06-27T14:24:32ZGoogle’s Larry Page wants to save 100,000 lives but big data isn’t a cure all<figure><img src="https://images.theconversation.com/files/52462/original/j29gpjjg-1403873793.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Machines lack the human touch needed in healthcare.</span> <span class="attribution"><span class="source">Shutterstock robot surgery</span></span></figcaption></figure><p>Talking up the power of big data is a real trend at the moment and Google founder Larry Page took it to new levels this week by proclaiming that <a href="http://www.nytimes.com/2014/06/26/technology/personaltech/a-reach-too-far-by-google.html">100,000 lives could be saved next year alone</a> if we did more to open up healthcare information.</p>
<p>Google, likely the biggest data owner outside the NSA, is evidently carving a place for itself in the big data vs life and death debate but Page might have been a little more modest, given that Google’s massive <a href="https://theconversation.com/googles-flu-fail-shows-the-problem-with-big-data-19363">Flu Trends programme ultimately proved unreliable</a>. Big data isn’t some magic weapon that can solve all our problems and whether Page wants to admit it or not, it won’t save thousands of lives in the near future. </p>
<h2>Big promises</h2>
<p>Saving lives by analysing healthcare data has become a major human ambition, but to say this is a tricky task would be an enormous understatement.</p>
<p>In the UK, the government has just produced a <a href="https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/323967/Consultation_document.pdf">consultation on introducing regulations</a> for protecting this kind of information alongside care.data, a huge scheme aiming to make health records available to researchers and others who could work with it.</p>
<p>Given the ongoing <a href="https://theconversation.com/outdated-laws-put-your-health-data-in-jeopardy-22465">care.data debacle</a>, this is a broadly sensible document and a promising start for consultation. In particular, it identifies different levels of data. <a href="https://theconversation.com/time-for-some-truth-about-who-is-feeding-off-our-nhs-data-23998">Data that could be used to identify</a> an individual person should not be shared in the same way as other types of data.</p>
<p>But, like Page, the UK government is also presenting a false vision for big data. It has said review after review have found that a failure to share information between healthcare workers has led to child deaths. It’s an emotive admission but rather beside the point in the big data perspective. </p>
<p>It is indeed entirely credible that many tragic failures within the NHS might have been prevented by someone sharing the right information with the right person. Sharing is essential, but when the NHS talks about sharing, it means linking and sharing large medical databases between organisations. Surely no case review has ever claimed that the mere existence of a larger database of information would have got the right knowledge to the right person. </p>
<p>Medical data sharing may be a good thing in many ways, but
unfortunately there is no clear case yet that automated analysis of
data prevents child deaths and other tragedies. It is <a href="http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html">only big data</a>, not magic. Preventing child deaths appears to be brought in as emotional blackmail, expected to trump the valid concerns over the NHS’ big data plans.</p>
<h2>Big disappointments</h2>
<p>The fact is, we are not as advanced as we would like to believe. This month, 60 years after Alan Turing died, his test for recognising “true” artificial intelligence made the news again. One in three human test subjects mistook a computer programme called <a href="http://mashable.com/2014/06/12/eugene-goostman-turing-test/">Eugene Goostman</a> for a 13-year-old Ukrainian boy. But Eugene didn’t really pass the test. The programme was simply good at playing the game and <a href="https://theconversation.com/eugene-the-turing-test-beating-teenbot-reveals-more-about-humans-than-computers-27775">relied heavily</a> on the fact that a 13-year-old probably wouldn’t know the answers to many of the questions. </p>
<p>The programme fell back on the same tactics used some 42 years ago by <a href="https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/classics/parry/">Parry</a>, a programme that tricked people into thinking it was a paranoid schizophrenic, and the even earlier <a href="http://www.cse.buffalo.edu/%7Erapaport/572/S02/weizenbaum.eliza.1966.pdf">Eliza programme</a> which had proved hard to distinguish from a real <a href="http://www.goodtherapy.org/person_centered.html">Rogerian therapist</a>. So much for progress.</p>
<p>The research field of artificial intelligence – or more modestly, machine learning – has been active for 60 years and passing the Turing test is its original Holy Grail. And many of the brightest minds in computer science have worked in this area. Computing power has been increasing exponentially over that time and the web provides a massive amount of samples of human communication to learn from. The fact that we have made such slow progress despite all these developments shows just how hard it is to turn vast amounts of data into human intelligence.</p>
<h2>Be wary of big claims</h2>
<p>This should teach us to be wary of anyone who makes bold claims about the potential of big data. Google Flu Trends sought to derive information about the spread of illness by gathering data when people searched for terms like “flu”. But we’ve seen time and time again that machines don’t understand humans and can’t mimic real human qualities.</p>
<p>A prime example can be found outside healthcare. It’s now broadly accepted that in the course of its surveillance programmes, the NSA had obtained information that might have prevented 9-11, but <a href="https://www.schneier.com/blog/archives/2014/01/debunking_the_n.html">failed to join the dots</a>.</p>
<p>Edward Snowden’s revelations made it clear that the NSA and GCHQ are collecting large “haystacks” of communications data. The intelligence services have made various claims that the analysis of this prevented serious terrorist attacks, but these claims <a href="http://www.newamerica.net/publications/policy/do_nsas_bulk_surveillance_programs_stop_terrorists">have not stood up to detailed scrutiny</a>. Given the amount of computing power the NSA possessed, even before the internet age, it must have been applying machine learning techniques to its bulk data for at least 30 years. Still, no evidence has been presented of any significant needles being found as a result – at least not any that is available to the public.</p>
<p>This all goes to show that using machine learning to process vast amounts of data, such as the information held in healthcare databases, won’t save lives alone. The kind of human insight needed to put the information to proper use still can’t be replicated by computers, even after decades of trying. </p>
<p>Doctors need to be able to ask the right questions and use their unique human qualities to make life changing decisions for their patients. Similarly, researchers still need to formulate their hypotheses and ask the medical databases targeted questions. They are not machines, and we should be grateful for that.</p><img src="https://counter.theconversation.com/content/28529/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Eerke Boiten is a senior lecturer in the School of Computing at the University of Kent, and Director of the University's interdisciplinary Centre for Cyber Security Research. He receives funding from EPSRC for the CryptoForma Network of Excellence on Cryptography and Formal Methods. He is a member of BCS and board member of its specialist group on Formal Aspects of Computer Science. He is also a director (governor) of The John of Gaunt School, a Community Academy.</span></em></p>Talking up the power of big data is a real trend at the moment and Google founder Larry Page took it to new levels this week by proclaiming that 100,000 lives could be saved next year alone if we did more…Eerke Boiten, Senior Lecturer, School of Computing and Director of Interdisciplinary Cyber Security Centre, University of KentLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/193632013-10-24T18:05:58Z2013-10-24T18:05:58ZGoogle’s flu fail shows the problem with big data<figure><img src="https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">A tower of used books</span> </figcaption></figure><figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=502&fit=crop&dpr=1 754w, https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=502&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/33468/original/96pwkgqt-1382430504.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=502&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Is more data better data?</span>
<span class="attribution"><span class="source">Jorge Royan</span></span>
</figcaption>
</figure>
<p>When people talk about ‘big data’, there is an <a href="http://www.ft.com/cms/s/2/afc1c178-8045-11e2-96ba-00144feabdc0.html">oft-quoted</a> example: a proposed <a href="http://www.bbc.co.uk/learningzone/clips/using-patterns-in-google-searches-to-predict-flu-outbreaks/13094.html">public health tool</a> called <a href="http://www.google.org/flutrends/">Google Flu Trends</a>. It has become something of a pin-up for the big data movement, but it might not be as effective as many claim.</p>
<p>The idea behind big data is that large amount of information can help us do things which smaller volumes cannot. Google first outlined the Flu Trends approach in a 2008 paper in the journal <a href="http://www.nature.com/nature/journal/v457/n7232/full/nature07634.html">Nature</a>. Rather than relying on disease surveillance used by the US Centers for Disease Control and Prevention (CDC) – such as visits to doctors and lab tests – the authors suggested it would be possible to predict epidemics through Google searches. When suffering from flu, many Americans will search for information related to their condition.</p>
<p>The Google team collected more than 50 million potential search terms – all sorts of phrases, not just the word “flu” – and compared the frequency with which people searched for these words with the amount of reported influenza-like cases between 2003 and 2006. This data revealed that out of the millions of phrases, there were 45 that provided the best fit to the observed data. The team then tested their model against disease reports from the subsequent 2007 epidemic. The predictions appeared to be pretty close to real-life disease levels. Because Flu Trends would able to predict an increase in cases before the CDC, it was trumpeted as the arrival of the big data age.</p>
<p>Between 2003 and 2008, flu epidemics in the US had been strongly seasonal, appearing each winter. However, in 2009, the first cases (as reported by the CDC) started in Easter. Flu Trends had already made its predictions when the CDC data was published, but it turned out that the Google model <a href="http://dx.doi.org/10.1371/journal.pone.0023610">didn’t match reality</a>. It had substantially underestimated the size of the initial outbreak.</p>
<p>The problem was that Flu Trends could only measure what people search for; it didn’t analyse why they were searching for those words. By removing human input, and letting the raw data do the work, the model had to make its predictions using only search queries from the previous handful of years. Although those 45 terms matched the regular seasonal outbreaks from 2003–8, they didn’t reflect the pandemic that appeared in 2009. </p>
<p>Six months after the pandemic started, Google - who now had the benefit of hindsight - updated their model so that it matched the 2009 CDC data. Despite these changes, the updated version of Flu Trends ran into difficulties again last winter, when it <a href="http://www.nature.com/news/when-google-got-flu-wrong-1.12413">overestimated</a> the size of the influenza epidemic in New York State. The incidents in 2009 and 2012 raised the question of how good Flu Trends is at predicting future epidemics, as opposed to merely finding patterns in past data.</p>
<p>In a new analysis, published in the journal <a href="http://dx.doi.org/10.1371/journal.pcbi.1003256">PLOS Computational Biology</a>, US researchers report that there are “substantial errors in Google Flu Trends estimates of influenza timing and intensity”. This is based on comparison of Google Flu Trends predictions and the actual epidemic data at the national, regional and local level between 2003 and 2013</p>
<p>Even when search behaviour was correlated with influenza cases, the model sometimes misestimated important public health metrics such as peak outbreak size and cumulative cases. The predictions were particularly wide of the mark in 2009 and 2012:</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=447&fit=crop&dpr=1 600w, https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=447&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=447&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=561&fit=crop&dpr=1 754w, https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=561&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/33467/original/phtf5w9c-1382430136.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=561&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Original and updated Google Flu Trends (GFT) model compared with CDC influenza-like illness (ILI) data.</span>
<span class="attribution"><span class="source">PLOS Computational Biology 9:10</span></span>
</figcaption>
</figure>
<p>Although they criticised certain aspects of the Flu Trends model, the researchers think that monitoring internet search queries might yet prove valuable, especially if it were linked with other surveillance and prediction methods. </p>
<p>Other researchers have also suggested that other sources of digital data – from Twitter feeds to mobile phone GPS – have the potential to be <a href="http://dx.doi.org/10.1371/journal.pcbi.1002616">useful tools for studying epidemics</a>. As well as helping to analysing outbreaks, such methods could allow researchers to analyse human movement and the spread of public health information (or misinformation).</p>
<p>Although much attention has been given to web-based tools, there is another type of big data that is already having a huge impact on disease research. Genome sequencing is enabling researchers to piece together <a href="http://www.thelancet.com/journals/laninf/article/PIIS1473-3099%2812%2970268-2/abstract">how diseases transmit</a> and <a href="http://www.plospathogens.org/article/info%3Adoi%2F10.1371%2Fjournal.ppat.1000918">where they might come from</a>. Sequence data can even reveal the existence of a new disease variant: earlier this week, <a href="http://news.sciencemag.org/health/2013/10/first-new-dengue-virus-type-50-years">researchers announced a new type of dengue fever virus</a>.</p>
<p>There is little doubt that big data will have some important applications over the coming years, whether in medicine or in other fields. But advocates need to be careful about what they use to illustrate the ideas. While there are plenty of successful examples emerging, it is not yet clear that Google Flu Trends is one of them.</p><img src="https://counter.theconversation.com/content/19363/count.gif" alt="The Conversation" width="1" height="1" />
When people talk about ‘big data’, there is an oft-quoted example: a proposed public health tool called Google Flu Trends. It has become something of a pin-up for the big data movement, but it might not…Adam Kucharski, Research Fellow in Mathematical Epidemiology, London School of Hygiene & Tropical MedicineLicensed as Creative Commons – attribution, no derivatives.