tag:theconversation.com,2011:/us/topics/information-storage-and-retrieval-1035/articlesInformation Storage and Retrieval – The Conversation2018-01-05T11:57:29Ztag:theconversation.com,2011:article/862742018-01-05T11:57:29Z2018-01-05T11:57:29ZThe libraries of the future will be made of DNA<figure><img src="https://images.theconversation.com/files/197638/original/file-20171204-22977-17hjfs8.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">Jezper/Shutterstock.com</span></span></figcaption></figure><p>There are <a href="http://www.internetlivestats.com/twitter-statistics/">6,000 tweets</a> sent a second. In the time you have read this sentence, 42,000 tweets will have been sent. At an average of <a href="http://www.independent.co.uk/life-style/gadgets-and-tech/news/twitter-character-limit-update-tweets-expanded-140-280-english-japanese-app-a7968961.html">34 characters per tweet</a> that’s 1,428,000 characters.</p>
<p><a href="http://worldwidewebsize.com/">Worldwidewebsize</a> daily estimates the size of the internet. On the day of writing, it amounted to 4.59 billion pages and a billion websites. This is the “indexed” internet, and doesn’t include the “dark web” or private databases. </p>
<p>The size of the web is measured in two ways. The first is “content” – storage capacity was <a href="https://www.livescience.com/54094-how-big-is-the-internet.html">estimated</a> in 2014 as 10<sup>24</sup> bytes, or a million <a href="https://en.wikipedia.org/wiki/Exabyte">exabytes</a>. The second is “traffic”, measured in <a href="https://en.wikipedia.org/wiki/Zettabyte">zettabytes</a>. Global traffic recently <a href="https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html">passed</a> one zettabyte, the content of 250 billion DVDs. </p>
<p>More conventionally, the <a href="https://www.theguardian.com/books/2014/oct/22/uk-publishes-more-books-per-capita-million-report">UK published</a> 184,000 books in 2013 – globally, the largest number <a href="https://en.wikipedia.org/wiki/Books_published_per_country_per_year#cite_note-publishingtechnology.com-2">per inhabitant</a>. Add the increasing ways of measuring a human being in terms of data – DNA sequencing, online family trees, genetic coding, bank accounts, online information of all kinds – or the amount of scientific data being produced and read <a href="https://skatelescope.org/news/raeng-grant-to-engage-with-ska-engineering/">around the world</a> and the amount of information in the world is staggering. Even the amount of storage most people need for photos and documents has grown hugely in the past few years.</p>
<p>As a species, we are producing information at a <a href="https://www.youtube.com/watch?v=iIKPjOuwqHo">massive rate</a>. The “reading” of the mass of data has led to new predictive models for <a href="https://www.amazon.co.uk/Big-Data-Revolution-Transform-Think/dp/1848547927">social interaction</a>. Businesses and governments are scrambling to make use of this data as human beings seem ever more readable, manageable and – possibly – controllable through the comprehension and manipulation of information.</p>
<p>But just how might all this information be stored? At present, we have physical libraries, and physical archives, and bookshelves. The internet itself is “stored” on hard-disk servers around the world, using enormous amounts of power to keep them cool. Online infrastructure is expensive, energy hungry, and vulnerable; its longevity is <a href="http://uk.businessinsider.com/facebook-fires-four-more-shots-into-the-server-market-2017-3?r=US&IR=T">also limited</a> – see <a href="https://en.wikipedia.org/wiki/Live_Free_or_Die_Hard#Plot">Die Hard 4.0</a> for a dramatisation of this.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=420&fit=crop&dpr=1 600w, https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=420&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=420&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=528&fit=crop&dpr=1 754w, https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=528&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/197639/original/file-20171204-22962-16lpmyk.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=528&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Data centres such as this one may soon be a thing of the past.</span>
<span class="attribution"><span class="source">Gorodenkoff/Shutterstock.com</span></span>
</figcaption>
</figure>
<h2>Libraries of the future</h2>
<p>The future of information storage may sound dull, but it is a crucial issue for anyone interested in the way that societies remember. A good example is family history, where public archives, such as census records and tax information, are increasingly accessed online. Millions of users around the world use subscription sites such as Ancestry or Findmypast to access this public information and to create their family trees using online software. This proliferation of information raises ethical issues about access (public records being used by private companies to make a profit) and about how this data is stored, managed and used.</p>
<p>We all have a stake in the way that libraries and archives might work in the future, how they might be configured, and what might be stored – and why. Do we really need to store every tweet ever sent? Making any kind of choice over what to store – what to collect, commemorate, archive – provokes a complex discussion. Technologies for accessing – “reading” – information need to be somehow futureproofed, or we will end up with huge amounts of information that cannot be used.</p>
<p>So: what to do? There are wide-ranging discussions at present, from what information to store (including various <a href="https://ntrs.nasa.gov/search.jsp?R=20170004513">biobanks</a> full of <a href="http://www.croptrust.org/main/content/svalbard-global-seed-vault">biological specimens</a>), to how to store it, to where to store it (the Arctic, <a href="http://www.sciencedirect.com/science/article/pii/S009457651630100X">various locations in space</a>, under water). Most of these discussions are occurring within scientific communities; some <a href="https://dspace.mit.edu/handle/1721.1/110132">technological companies</a> are involved. Those who have spent years thinking about memory, commemoration and archiving – historians and librarians – are often on the fringes of the discussion.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=420&fit=crop&dpr=1 600w, https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=420&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=420&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=528&fit=crop&dpr=1 754w, https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=528&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/197640/original/file-20171204-22996-1yxss3e.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=528&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Stored information, old style.</span>
<span class="attribution"><span class="source">By kurbanov/Shutterstock.com</span></span>
</figcaption>
</figure>
<h2>Nanocrystals and DNA</h2>
<p>Various different organisations are exploring physical ways of storing humanity’s information. Physical storage on nickel disks (read by microscope) or laser-written barcodes on silica glass have been suggested. Highly experimental – and at present energy-hungry – <a href="https://www.nature.com/articles/natrevmats201670">nanotechnology</a> looks to write information at the near-molecular level (although the use of the word “write” is very much out of date here). Nanotechnological storage would be “read” through sophisticated microscopy and is sometimes the “effect” of chemical change or quite complicated processes, such as nanocrystals converting radiation (infra-red) into something “visible”. Some of the more baroque storage models range from a flash data memory vault on the moon to <a href="http://www.timecapsuletomars.com/">private companies sending digital content</a> to Mars, to <a href="http://www.keo.org/uk/pages/faq.html#q3">satellites orbiting the earth</a>.</p>
<p>But most of the activity at present seems to be biological. Various scientists have begun to explore the possibility of using DNA to store <a href="http://www.nature.com/nature/journal/v494/n7435/abs/nature11875.html?foxtrotcallback=true">information</a>, called Nuclear Acid Memory (NAM). </p>
<p>This would involve the data being “translated” into the letters GATC, the base nucleic acids of DNA. DNA strands would then be created which could be translated back into the “original” by being sequenced. Researchers recently stored archival-quality versions of music by <a href="https://pitchfork.com/news/miles-davis-tutu-is-one-of-the-first-songs-to-be-encoded-in-dna/">Miles Davis and Deep Purple</a> and also of <a href="http://www.bbc.co.uk/news/av/science-environment-40585302/movie-encoded-into-the-dna-of-bacteria">a short GIF</a> in DNA form. </p>
<p>DNA is durable and increasingly easy to produce and read. It will keep for thousands of years in the right storage conditions. DNA might be stored anywhere that is dark, dry, cold, and arguably would not take up a great deal of room.</p>
<p>Much of this technology is in its infancy, but developments in nanotechnology and DNA sequencing suggest that we will be seeing the applied results of experimentation and development within years. Wider questions arise about the ethics of collection and to what extent these processes will become mainstream. Print, and to a certain extent digital, have become common and reasonably <a href="https://books.google.co.uk/books/about/The_Printing_Press_as_an_Agent_of_Change.html?id=0-FThHK2DNMC&redir_esc=y">democratic</a> ways of transmitting and storing information. It remains to be seen whether future storage and writing will be as easy to access, and who will be in control of humanity’s information and memory in the coming decades and centuries.</p><img src="https://counter.theconversation.com/content/86274/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Jerome de Groot receives funding from AHRC. </span></em></p>Technologies for accessing information need to be somehow future-proofed.Jerome de Groot, Senior Lecturer, University of ManchesterLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/24392011-07-21T03:10:30Z2011-07-21T03:10:30ZCrisis management: using Twitter and Facebook for the greater good<figure><img src="https://images.theconversation.com/files/2410/original/aapone-20110310000304501823-australia-weather-floods-original.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Organising the information gathered during crises is key to better responses. </span> <span class="attribution"><span class="source">Lindsay Hallam/AFP</span></span></figcaption></figure><p>With new technology comes new ways of communicating with one another in times of crisis. Platforms such as Twitter and Facebook allow important information to be <a href="http://www.youtube.com/watch?v=_R1I67KSyBg">shared widely and instantaneously</a>.</p>
<p>But new technology is needed to extract and preserve the fruit of these new media, to allow crisis managers, crisis communicators and other key decision-makers to better manage a response. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=209&fit=crop&dpr=1 600w, https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=209&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=209&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=263&fit=crop&dpr=1 754w, https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=263&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/2406/original/tweet3.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=263&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption"></span>
</figcaption>
</figure>
<p>How can this type of information be ordered and made available to those who need it? Currently, the answer is “not very easily”. The <a href="http://mashable.com/2009/05/17/twitter-hashtags/">hashtag</a>, used on Twitter to share information on a common point of interest does some of this work, but is imperfect. Because anyone can use a given hashtag the information in the content stream is indiscriminate. </p>
<p>Crisis management leans on efficient and effective use of past experiences, experts’ tacit knowledge and information gathered on a given crisis.</p>
<p>The objective of <a href="http://www.nicta.com.au/">my team at NICTA</a> is to develop a search engine that captures information on humanitarian crises in social media.</p>
<p>So what will our search engine be able to do?</p>
<p>1) It will identify tweets that carry crisis information, using rule-based filtering techniques. </p>
<p>2) After this filtering phase, the remaining messages will be classified according to well-defined crisis management categories: “economy”, “food”, “environment”, “personal”, “community”, “political”, “health” and “unknown”. </p>
<p>3) It creates time trends of messages in each category, maps the messages to their geographic location, and identifies whether the author represents the government, media, or laypeople.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=184&fit=crop&dpr=1 600w, https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=184&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=184&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=232&fit=crop&dpr=1 754w, https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=232&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/2407/original/tweet2.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=232&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption"></span>
</figcaption>
</figure>
<p>So, where is our search engine up to? </p>
<p>So far, we have introduced a conceptual model of our engine for information retrieval on Twitter from a human security viewpoint. The model pinpoints the added value of social media for crisis management and outlines real-life case studies for our technologies.</p>
<p>We have also defined the information search task. This includes both machine learning methods to implement the technologies and guidelines for gathering expert-annotated data for its validation. </p>
<p>This involved a crisis management expert in our team developing an annotation guideline and training two other team members to perform this manual classification task on a pilot data set of approximately 350 messages. </p>
<p>We have created two new data sets and their expert annotations.</p>
<p>The first data set includes over 50,000 Twitter messages and user profiles on the <a href="http://en.wikipedia.org/wiki/February_2011_Christchurch_earthquake">Christchurch earthquake in February of this year</a>.</p>
<p>The second data set includes nearly 50,000 Twitter messages on the recent <a href="http://en.wikipedia.org/wiki/2010%E2%80%932011_Queensland_floods">Queensland floods</a>.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=253&fit=crop&dpr=1 600w, https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=253&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=253&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=318&fit=crop&dpr=1 754w, https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=318&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/2408/original/tweet1.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=318&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption"></span>
</figcaption>
</figure>
<p>The expert annotations provided model solutions to our content classification task with categories of “economy”, “food”, “environment”, “personal”, “community”, “political”, “health” and “unknown”. </p>
<p>Then, the two team members who had been working with the crisis management expert annotated the final set of 1,000 messages. The high quality of our task definition, annotation guidelines and expert training was demonstrated by the team members’ almost perfect agreement in the classification task. </p>
<p>We have also seen positive early results on the automated task. We are currently improving our methods for identifying social media messages relating to our categories of “food”, “environment”, “personal”, “political” and “health” although the very limited number of relevant messages in our data sets is making the classification task more difficult. </p>
<p>On the category of “community”, the classification performance already looks promising. Finally, and most importantly, on the most prevalent category of “economic”, our performance is excellent, as is our ability to distinguish messages by government, media and laypeople.</p>
<p>Could this technology be used to great effect in future crisis management? We believe it could, and will keep working to ensure that happens.</p>
<p><em>For more information about computer-based decision support and Twitter, watch the recent video lectures by <a href="http://leifhanlen.wordpress.com/2011/07/04/">Aapo Immonen</a> and <a href="http://leifhanlen.wordpress.com/2011/07/14/nicta-seminar-information-retrieval-from-social-media-during-humanitarian-crises/">Karl Kreiner</a>.</em></p><img src="https://counter.theconversation.com/content/2439/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Hanna Suominen works for NICTA, National ICT Australia and holds an adjunct research fellow position in the Australian National University. She also receives funding from the Academy of Finland (decision number 136653).</span></em></p>With new technology comes new ways of communicating with one another in times of crisis. Platforms such as Twitter and Facebook allow important information to be shared widely and instantaneously. But…Hanna Suominen, Machine learning researcher, Data61Licensed as Creative Commons – attribution, no derivatives.