tag:theconversation.com,2011:/fr/topics/data-analysis-37703/articlesData analysis – The Conversation2024-03-13T12:28:20Ztag:theconversation.com,2011:article/2240442024-03-13T12:28:20Z2024-03-13T12:28:20ZRobo-advisers are here – the pros and cons of using AI in investing<figure><img src="https://images.theconversation.com/files/580679/original/file-20240308-28-55toe3.jpg?ixlib=rb-1.1.0&rect=59%2C0%2C7951%2C4345&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">shutterstock</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/smart-businessman-hand-close-nft-financial-2074315681">thinkhubstudio/Shutterstock</a></span></figcaption></figure><p>Artificial intelligence (AI) is <a href="https://www.ft.com/content/6766a3bd-1cec-4e88-9f51-5ed93b39528c">shaking up</a> the way we invest our money. Gone are the days when complex tools were reserved for the wealthy or financial institutions. </p>
<p>AI-powered <a href="https://www.investopedia.com/best-robo-advisors-4693125">robo-advisers</a>, such as <a href="https://www.betterment.com/">Betterment</a> and <a href="https://investor.vanguard.com/advice/robo-advisor">Vanguard</a> in the US, and finance app <a href="https://www.revolut.com/en-HU/news/revolut_launches_robo_advisor_in_eea_to_automate_investing/">Revolut</a> in Europe, are now democratising investment. These tools are making professional financial insight and portfolio management available to everyone. But although there are plenty of advantages to using robo-advisers, there are downsides too. </p>
<p>Since the 1990s, <a href="https://arxiv.org/pdf/2104.05413.pdf">AI’s role</a> in this sector was typically confined to algorithmic trading and quantitative strategies. These rely on advanced mathematical models to predict stock market movements and trade at lightning speed, far exceeding the capabilities of human traders. </p>
<p>But that laid the groundwork for more advanced applications. And AI has now <a href="https://www.weforum.org/agenda/2017/09/robots-could-plan-your-retirement-financial-advice/">evolved</a> to handle data analysis, predict trends and personalise investment strategies. Unlike traditional investment tools, robo-advisers are more <a href="https://www2.deloitte.com/us/en/insights/industry/financial-services/financial-services-industry-predictions/2023/democratize-financial-services.html">accessible</a>, making them ideal for a new generation of investors. </p>
<p>A survey published in 2023 showed that there has been a particular <a href="https://www.investopedia.com/study-affluent-millennials-are-warming-up-to-robo-advisors-4770577">surge</a> in young people using robo-advisers. Some 31% of gen Zs (born after 2000) and 20% of millennials (born between 1980 and 2000) are using robo-advisers. </p>
<p>Another <a href="https://www.magnifymoney.com/news/robo-advisor-survey/">survey</a> from 2022 found that 63% of US consumers were open to using a robo-adviser to manage their investments. In fact, projections indicate that assets managed by robo-advisers will reach <a href="https://www.statista.com/outlook/fmo/wealth-management/digital-investment/robo-advisors/worldwide">US$1.8 trillion</a> (£1.4 trillion) globally in 2024. </p>
<p>This trend reflects not only changing investor preferences but also how the financial industry is adapting to technology.</p>
<h2>Tailored advice</h2>
<p>AI can <a href="https://www.ftadviser.com/your-industry/2023/07/17/can-generative-ai-truly-replace-a-financial-adviser/">tailor</a> investment advice to a person’s preferences. For example, for investors who want to prioritise ethical investing in environmental, social and governance stocks, AI can tailor a strategy without the need to pay for a financial adviser. </p>
<p>AI can <a href="https://www.sciencedirect.com/science/article/pii/S0275531923000077">analyse</a> news and social media to understand market trends and predict potential movements, offering insights into potential market movements. Portfolios built by robo-advisers may also be <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/poms.14029">more resilient during market downturns</a>, effectively managing risk and protecting investments.</p>
<p>Robo-advisers can offer certain <a href="https://www.ft.com/content/6694bb4a-a585-496a-b7f3-d1841984f9b3">features</a> like reduced investment account minimums and lower fees, which make services more accessible than in the past. Other features such as <a href="https://corporatefinanceinstitute.com/resources/wealth-management/robo-advisors/">tax-loss harvesting</a>, a strategy of selling assets at a loss to reduce taxes, and <a href="https://corporatefinanceinstitute.com/resources/wealth-management/robo-advisors/">periodic rebalancing</a>, which involves adjusting the proportions of different types of investments, make professional investment advice accessible to a wider audience.</p>
<p>These types of innovations are particularly beneficial for people in underserved communities or with limited financial resources. This has the <a href="https://www.brookings.edu/articles/robo-advice-an-effective-tool-to-reduce-inequalities/">potential</a> to improve financial literacy through empowering people to make better financial decisions. </p>
<h2>AI’s multifaced role</h2>
<p>AI’s impact on investment fund management goes way beyond robo-advisers, however. Fund managers are using AI algorithms in a variety of ways. </p>
<p>In terms of data analysis, AI can sift through vast amounts of market data and historical trends to identify <a href="https://doi.org/10.1016/j.frl.2022.102941">ideal assets</a> and adjust portfolios in real time as markets fluctuate. AI is also used to <a href="https://www.sciencedirect.com/science/article/pii/S0378426621002466">improve risk management</a> by analysing complex data and making sophisticated decisions. </p>
<p>By using AI in this way, <a href="https://doi.org/10.1016/j.jedc.2022.104438">traders</a> can react and make faster decisions, which maximises efficiency. Other mundane tasks like <a href="https://ieeexplore.ieee.org/document/9315986">compliance monitoring</a> are increasingly automated by AI. This frees fund managers up to focus on more strategic decisions. </p>
<figure class="align-center ">
<img alt="A close up of a pair of hands holding a mobile phone with pound coins superimposed onto the foreground." src="https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=350&fit=crop&dpr=1 600w, https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=350&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=350&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=440&fit=crop&dpr=1 754w, https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=440&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/580727/original/file-20240308-24-xg6lqw.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=440&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">While AI is democratising investing, that comes with challenges.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/double-exposure-uk-stock-graphic-close-792232471">Loch Earn/Shutterstock</a></span>
</figcaption>
</figure>
<h2>What are the disadvantages?</h2>
<p>One of the biggest concerns regarding AI in this sector is based on how having easy access to advanced investment tools may lead some people to overestimate their abilities and take too many financial risks. The sophisticated algorithms used by robo-investors can be opaque, which makes it <a href="https://www.lseg.com/en/insights/data-analytics/how-might-ai-impact-investment-management">difficult</a> for some investors to fully understand the potential risks involved. </p>
<p>Another concern is how the evolution of robo-advisers has outpaced the implementation of <a href="https://fastercapital.com/content/Regulatory-Compliance-in-B2B-Robo-Advisors--Navigating-the-Legal-Landscape.html#Challenges-and-Opportunities">laws and regulations</a>. That could expose investors to financial risks and a lack of legal protection. This is an issue yet to be adequately addressed by financial authorities. </p>
<p>Looking ahead, the future of investment probably lies in a hybrid model. Combining the precision and efficiency of AI with the experience and oversight of human investors is vital.</p>
<p>Ensuring that information is accessible and transparent will be crucial for <a href="https://www.turing.ac.uk/sites/default/files/2021-06/ati_ai_in_financial_services_lores.pdf">fostering</a> a more informed and responsible investment landscape. By harnessing the power of AI responsibly, we can create a financial future that benefits everyone.</p><img src="https://counter.theconversation.com/content/224044/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Robo-advisers and AI are making investing accessible to everyone, but there are also risks to consider.Laurence Jones, Lecturer in Finance, Bangor UniversityHeather He, Lecturer in Data Science/Analytics, Bangor UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/2230252024-03-11T13:07:21Z2024-03-11T13:07:21ZTechnology to protect South Africa’s oceans: experts find that a data-driven monitoring system is paying off<figure><img src="https://images.theconversation.com/files/577893/original/file-20240226-24-qjmkpc.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">A fishing boat launching into South African waters at dawn.</span> <span class="attribution"><span class="source">Justin Klusener Photos</span></span></figcaption></figure><p>Nine years ago South Africa put in place an innovative information management system designed to monitor and protect its seas. The country is surrounded by the Atlantic and Indian oceans on its southern, eastern and western borders. </p>
<p>The oceans are an <a href="https://www.tandfonline.com/doi/abs/10.1080/19480881.2015.1066555">important source of income and employment</a>. The ocean economy <a href="https://www.dffe.gov.za/sites/default/files/docs/publications/oceans-economy-summary-progress-report-June2019.pdf">contributed about R110 billion</a> (around US$5.7 billion) to South Africa’s GDP in 2010. A 2019 government report <a href="https://www.dffe.gov.za/sites/default/files/docs/publications/oceans-economy-summary-progress-report-June2019.pdf">projected</a> that, by 2033, this would rise to R177 billion (US$9.2 billion), as well as creating just over one million jobs. The main sectors in ocean industries are maritime transport, fisheries and aquaculture, mineral resource exploitation and tourism. The potential for economic growth is also reflected in the country’s <a href="https://www.gov.za/sites/default/files/gcis_document/201706/saoceaneconomya.pdf">Operation Phakisa Oceans Economy plan</a>.</p>
<p>But, while the sheer extent of its maritime domain presents many opportunities, it also comes with governance challenges. It’s hard to monitor and plan for ocean-related economic development and conservation.</p>
<p>That’s where the National Oceans and Coastal Information Management System (<a href="https://ocims.environment.gov.za/About.html">OCIMS</a>) comes in. It was conceptualised within the country’s Department of Forestry, Fisheries and the Environment in 2012 and officially launched in 2015 in partnership with the Council for Scientific and Industrial Research (CSIR). </p>
<p>While the system is tailored to South Africa’s national priorities, it was inspired by other mature ocean information systems around the world, such as those in <a href="https://imos.org.au/">Australia</a> and the <a href="https://coastwatch.noaa.gov/cwn/index.html">US</a>.</p>
<p>The system brings ocean observations made by various national agencies into one platform. The major users are also partners who contribute to the system by sharing data and expertise.</p>
<p>For example, data capture apps on the system are used to share measurements made on aquaculture farms and inform users on the potential risk of <a href="https://oceanservice.noaa.gov/facts/redtide.html">red tides</a> (a common name used for harmful algal bloom). Boat-based whale watching operators contribute their marine species sightings data towards biodiversity assessments. All this data can be analysed by scientists and their findings used to advise on policy options or compliance and enforcement actions.</p>
<p>In a <a href="https://doi.org/10.1016/j.jenvman.2024.120255">recent paper</a> we looked at how the system emerged and why it’s been important for the protection of the country’s oceans. We found that it was providing value for money: it helped mitigate environmental or security risks, resulting in significant cost savings for the public and private sectors. It also promoted dialogue across government departments, non-profit organisations and the private sector. This facilitates a coordinated approach to ocean governance.</p>
<p>The approach taken to establish the system could benefit other countries looking to build their own ocean and coastal system knowledge platforms.</p>
<h2>Data-driven</h2>
<p>As the COVID pandemic demonstrated, informed decisions cannot occur without access to data. Historical and operational data provides situational awareness, informs policy and supports long-term planning and management.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/how-african-countries-can-harness-the-huge-potential-of-their-oceans-77889">How African countries can harness the huge potential of their oceans</a>
</strong>
</em>
</p>
<hr>
<p>To this end, the Department of Forestry, Fisheries and the Environment, working with the <a href="https://www.saeon.ac.za/">South African Environmental Observations Network</a>, created the <a href="https://data.ocean.gov.za/">Marine Information Management System</a>. It’s an essential component of the overall OCIMS system. It preserves, discovers and disseminates long-term data. It is internationally accredited and bound by best international standards and practices. </p>
<p>The system also makes data more accessible by providing <a href="https://ica-abs.copernicus.org/articles/6/275/2023/ica-abs-6-275-2023.pdf">user-specific data capture applications</a>, complemented by data visualisation platforms such as webmaps and dashboards. </p>
<h2>Supporting decisions</h2>
<p>Another of the system’s aims is to provide tools for supporting decisions. Such tools can be used for coordination and response (for example, monitoring <a href="https://ica-abs.copernicus.org/articles/6/275/2023/ica-abs-6-275-2023.pdf">avian influenza</a>). They can also be used in compliance and enforcement initiatives, such as tracking vessels.</p>
<p>The Fisheries and Aquaculture tool, for instance, supports both the public and private sectors by providing warnings on potentially harmful algal blooms, a phenomenon that can threaten aquaculture farms or affect fish and lobster populations. It detects algal blooms through satellite observations; this satellite data is complemented by information from those in the field, combining to create an active, interactive decision-making tool.</p>
<p>Then there’s the Integrated Vessel Tracking tool. It monitors vessels’ movements and is used daily by the institutions mandated to enforce security at sea, such as intelligence services and the navy, to detect or intercept illegal activities at sea. <a href="https://issafrica.org/research/books-and-other-publications/south-africas-maritime-domain-awareness-a-capability-baseline-assessment">Researchers say</a> the tool has worked to prevent illegal fishing and marine pollution. It’s also been instrumental in the interception of drug-loaded vessels.</p>
<h2>Collaboration</h2>
<p>All of these successes have been made possible by secure, sustained funding by the South African government. That has instilled a sense of security in collaborators and partners; they provide invaluable co-funding, expertise and data, saving money and building resilience into the system.</p>
<p>Some of the system’s tools have been shared with other countries in the <a href="https://marcosio.org/">southern African</a> and Indian Ocean regions. </p>
<p>As the project’s visibility increases, new opportunities for collaborations are emerging. Government departments, non-profit organisations and the private sector are coming forward with offers to share data. The system is also being proposed for use by academic scientists in their proposals.</p>
<p>One of the main lessons emerging from our research, which may be of interest to other countries wanting to launch similar initiatives, is that it’s crucial to involve a system’s major users in development from the start. Formalised stakeholder interactions ensure that the system directly responds to major user needs. That makes it immediately relevant and useful.</p><img src="https://counter.theconversation.com/content/223025/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Marjolaine Krug works for the South African Department of Forestry, Fisheries and the Environment, Oceans and Coast Branch.
The OCIMS is funded through Operation Phakisa Marine Protection Services and Ocean Governance workstream and is a partnership between the Department of Forestry, Fisheries and the Environment, the Department of Science and Innovation, the Council for Scientific and Industrial Research, the South African Environmental Observation Network and the South African Weather Services. </span></em></p><p class="fine-print"><em><span>Ashley Naidoo was the Chief Director for the Oceans and Coasts Science Programs at the Department of Forestry Fisheries and the Environment until January 2024.</span></em></p><p class="fine-print"><em><span>Lauren Williams works for the Department of Forestry, Fisheries and the Environment (South Africa). She has been involved in the development of the Oceans and Coastal Information Management System (OCIMS) since its inception. </span></em></p>South Africa’s ocean information management system is helping to mitigate security and environmental risks.Marjolaine Krug, Senior Scientific Advisor, University of Cape TownLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/2195572024-02-23T13:48:31Z2024-02-23T13:48:31ZHow governments handle data matters for inclusion<figure><img src="https://images.theconversation.com/files/576859/original/file-20240220-30-3ger1q.jpg?ixlib=rb-1.1.0&rect=1785%2C0%2C3779%2C3704&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Do you feel included in how government handles and uses data?</span> <span class="attribution"><a class="source" href="https://newsroom.ap.org/detail/Biden/1577ded6699c49ea835bbf2ee5fbb3a7/photo">AP Photo/Patrick Semansky</a></span></figcaption></figure><p>Governments increasingly rely on large amounts of data to provide services ranging from <a href="https://doi.org/10.1177/1461444820902682">mobility</a> and <a href="https://www.epa.gov/outdoor-air-quality-data/air-data-basic-information">air quality</a> to <a href="https://doi.org/10.1080/01442872.2020.1724928">child welfare</a> and <a href="https://doi.org/10.1177/1473225419883706">policing programs</a>. While governments have always relied on data, their increasing use of algorithms and <a href="https://www.oecd.org/gov/innovative-government/working-paper-hello-world-artificial-intelligence-and-its-use-in-the-public-sector.htm">artificial intelligence</a> has fundamentally changed the way they use data for public services.</p>
<p>These technologies have the potential to improve the effectiveness and efficiency of public services. But if data is not handled thoughtfully, it can lead to inequitable outcomes for different communities because data gathered by governments can <a href="https://doi.org/10.1002/9781119815075.ch46">mirror existing inequalities</a>. To minimize this effect, governments can make inclusion an element of their data practices. </p>
<p>To better understand how data practices affect inclusion, we – scholars of <a href="https://scholar.google.com/citations?hl=en&user=sRReVx0AAAAJ&view_op=list_works&sortby=pubdate">public affairs</a>, <a href="https://scholar.google.com/citations?hl=vi&user=d1PUVQgAAAAJ&view_op=list_works&sortby=pubdate">policy</a> and <a href="https://scholar.google.com/citations?hl=en&user=Uhk-JAcAAAAJ&view_op=list_works&sortby=pubdate">administration</a> – break down <a href="https://doi.org/10.1111/puar.13585">government data practices</a> into four activities: data collection, storage, analysis and use. </p>
<h2>Collection</h2>
<p>Governments collect data about all manner of subjects via surveys, registrations, social media and in <a href="https://www.trafficengland.com/">real time</a> via mobile devices such as sensors, cellphones and body cameras. These datasets provide opportunities to shape social inclusion and <a href="https://www.census.gov/about/what/data-equity.html">equity</a>. For example, open data can be used as a spotlight to <a href="https://doi.org/10.1111/cag.12608">expose</a> <a href="https://www.nytimes.com/interactive/2023/02/12/upshot/child-maternal-mortality-rich-poor.html">health disparities</a> or inequalities in <a href="https://www.nytimes.com/interactive/2023/11/06/business/economy/commuting-change-covid.html">commuting</a>. </p>
<p>At the same time, we found that poor-quality data can worsen inequalities. Data that is incomplete, outdated or inaccurate can result in the underrepresentation of vulnerable groups because they may not have access to the technology used to collect the data. Also, government data collection might lead to <a href="https://www.latimes.com/california/story/2022-08-17/lapd-adopts-new-rules-for-obtaining-using-t">oversurveillance</a> of vulnerable communities. Consequently, some people may <a href="https://doi.org/10.1177/0003122417725865">choose to avoid</a> contributing data to government institutions.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="A city map with numerous small red, orange and yellow squares" src="https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=498&fit=crop&dpr=1 600w, https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=498&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=498&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=625&fit=crop&dpr=1 754w, https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=625&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/576861/original/file-20240220-28-vgmlvt.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=625&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Predictive policing is an example of government use of data that researchers have found can be biased and inaccurate.</span>
<span class="attribution"><a class="source" href="https://commons.wikimedia.org/wiki/File:Criminaliteits_Anticipatie_Systeem.png">Arnout de Vries/Wikimedia</a></span>
</figcaption>
</figure>
<p>To foster inclusive practices, government practitioners could work with citizens to develop inclusive data collection protocols.</p>
<h2>Storage</h2>
<p>Data storage refers to where and how data is stored by the government, such as in databases or cloud data storage services. We found that government decisions about access to stored data and data ownership might lead to <a href="https://doi.org/10.1111/puar.13615">administrative exclusion</a>, meaning unintentionally restricting citizen access to benefits and services. For example, administrative registration errors in applications for services and the difficulty citizens experience when they attempt to correct errors in stored data can lead to differences in how governments treat them and even a loss of public services. </p>
<p>We also found that personal data might be stored with cloud vendors in data warehouses <a href="https://doi.org/10.1177/2053951720912775">outside the influence of the government organizations</a> that initially created and collected the data. While governments are typically required to follow rigorous data collection practices, data storage companies do not necessarily need to comply with the same standards. </p>
<p>To overcome this problem, governments can set transparency and accountability requirements for data storage that foster inclusion.</p>
<h2>Analysis</h2>
<p>One important way governments analyze data to extract information is by using algorithms. For example, <a href="https://doi.org/10.1177/1354856520933838">predictive policing</a> uses algorithms to predict where crime will occur.</p>
<p>A key question is who is conducting the analysis. Those who might be providing data, such as citizens or civil society organizations, are less likely to analyze the data. Citizens may not have the <a href="https://doi.org/10.1080/23251042.2016.1220849">skills, expertise or the tools</a> to do so. Often, external experts conduct the analysis, and they might be unaware of the historical context, culture and local conditions of the data. In that way, data may also construct and reinforce inequalities.</p>
<p>To foster inclusion, governments could diversify and increase the training of the teams who perform the analyses and write the algorithms so that they can interpret data within its larger historical and political context.</p>
<h2>Using the data</h2>
<p>Finally, governments are using the results of data analysis to inform public service provision. For example, data-driven visualizations, such as maps, might be used to <a href="https://nij.ojp.gov/topics/articles/crime-mapping-crime-forecasting-evolution-place-based-policing">make decisions about where to direct police officers</a>. However, this might also lead to <a href="https://doi.org/10.1177/0003122417725865">disproportionate surveillance</a> of different groups.</p>
<p>Another issue is “<a href="https://doi.org/10.1080/17579961.2021.1898299">function creep</a>.” Data might be collected for one purpose but is often eventually used for other purposes or by other government agencies, possibly leading to misuse of data and the reproduction of inequalities.</p>
<p><a href="https://doi.org/10.1002/asi.24639">Digital literacy programs</a> for both government professionals and the public can facilitate a better understanding of how data is visualized and used.</p>
<h2>Building inclusion into the process</h2>
<p>It is important to highlight that these activities – collection, storage, analysis and use – are linked. Inequalities in the early stages may eventually lead to inequitable outcomes in the form of policies, decisions and services. </p>
<p>Additionally, we found a conundrum: On the one hand, the invisibility of vulnerable groups in data collection can result in inequalities. Therefore, different groups should be included in the activities of the data process. On the other hand, this can also be problematic because digital footprints can lead to oversurveillance of the same groups.</p>
<p>Reconciling these conflicting concerns requires an <a href="https://dialnet.unirioja.es/servlet/articulo?codigo=7400442">ethical reflection</a>: pausing before embracing data and reflecting on its purpose, limitations and long-term implications for inclusion. </p>
<p>The four activities are a repeated rather than linear process in which governments, citizens and third parties embrace <a href="https://doi.org/10.1111/puar.13585">inclusive data strategies</a>. This means looking at what was created, including diverse voices and understanding the analysis, results and consequences of decisions. And it means consistently changing aspects of the process that do not foster inclusion.</p><img src="https://counter.theconversation.com/content/219557/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Suzanne J. Piotrowski has received funding from the National Science Foundation and the Open Government Partnership.</span></em></p><p class="fine-print"><em><span>Gregory Porumbescu has received external funding from the National Science Foundation and the New Jersey Office of the Secretary of Higher Education.</span></em></p><p class="fine-print"><em><span>Erna Ruijer does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Governments can exclude certain groups of people in policies and services not only by the type of data they collect but also how they collect, store, analyze and use the data.Suzanne J. Piotrowski, Professor of Public Affairs and Administration, Rutgers University - NewarkErna Ruijer, Assistant Professor of Governance, Utrecht UniversityGregory Porumbescu, Associate Professor of Public Affairs and Administration, Rutgers University - NewarkLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/2106632023-08-10T12:25:02Z2023-08-10T12:25:02ZAI threatens to add to the growing wave of fraud but is also helping tackle it<figure><img src="https://images.theconversation.com/files/541723/original/file-20230808-19-q8t3ng.jpg?ixlib=rb-1.1.0&rect=0%2C24%2C5452%2C3812&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">The government, banks and other financial organisations are now dealing with fraud by using increasingly sophisticated detection methods.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/internet-fraud-darknet-data-thiefs-cybercrime-1716862513">Maksim Shmeljov/Shutterstock</a></span></figcaption></figure><p>There were <a href="https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/articles/natureoffraudandcomputermisuseinenglandandwales/yearendingmarch2022">4.5 million</a> reported incidents of fraud in the UK in 2021/22, up 25% on the year before. It is a growing problem which costs billions of pounds every year. </p>
<p>The COVID pandemic and the cost of living crisis have created <a href="https://www.bbc.co.uk/news/business-55769991">ideal conditions</a> for fraudsters to exploit the vulnerability and desperation of many households and businesses. And with the use of AI increasing in general, we will likely see a further increase in <a href="https://www2.deloitte.com/uk/en/blog/auditandassurance/2023/generative-ai-and-fraud-what-are-the-risks-that-firms-face.html">new types of fraud</a> and is probably contributing to the increased frequency of fraud we are seeing today. </p>
<p>Already, the ability of AI to absorb personal data, such as emails, photographs, videos and <a href="https://www.cbsnews.com/news/scammers-ai-mimic-voices-loved-ones-in-distress/#:%7E:text=Artificial%20intelligence%20is%20making%20phone,mounting%20losses%20due%20to%20fraud.">voice recordings</a> to imitate people is proving to be a new and unprecedented challenge. </p>
<p>But there is also an upside. The government, banks and other financial organisations are now fighting back with increasingly sophisticated fraud-detection methods. AI and machine learning models could be a <a href="https://www.weforum.org/agenda/2023/04/as-generative-ai-gains-pace-industry-leaders-explain-how-to-make-it-a-force-for-good/">part of the solution</a> to deal with the increasing complexity, sophistication and prevalence of such scams.</p>
<p>The rising gap between prices and people’s incomes appears to have made people more <a href="https://www.citizensadvice.org.uk/about-us/about-us1/media/press-releases/over-40-million-targeted-by-scammers-as-the-cost-of-living-crisis-bites/">receptive</a> to scams which offer grants, rebates and support payments. </p>
<p>Fraudsters often target individuals by posing as genuine organisations. Examples include pretending to be your bank or posing as the government telling you that you are eligible for a lucrative scheme, in order to steal your identity details and then money. </p>
<p>This follows a dramatic rise in recent years of fraudulent applications to government and regional support packages, mainly implemented in response to the pandemic. Here fraudsters often pose as fake businesses to secure multiple loans or grants. </p>
<p>One of the <a href="https://www.manchestereveningnews.co.uk/news/greater-manchester-news/man-who-pretended-greggs-bakery-27251086">most outlandish examples</a> of this was a Luton man who posed as a Greggs bakery to swindle three local authorities in England out of almost £200,000 worth of COVID small business grants.</p>
<p>The hurried roll out of such schemes for faster economic impact made it difficult for officials to effectively review applications. The UK government’s Department for Business and Trade now <a href="https://www.bbc.co.uk/news/business-59504943">estimates</a> that 11% of such loans, roughly £5 billion, were fraudulent. By March 2022 only £762 million <a href="https://www.gov.uk/government/publications/hmrc-issue-briefing-tackling-error-and-fraud-in-the-covid-19-support-schemes/tackling-error-and-fraud-in-the-covid-19-support-schemes">had been recovered</a>.</p>
<h2>Fraud detection</h2>
<p>Over the past few years, complex mathematical models combining traditional statistical techniques and machine learning analysis have shown promise in the <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/acfi.12742">early detection</a> of financial statement fraud. This is when companies typically misrepresent or deceive investors into believing they are more profitable than they really are.</p>
<p>One of the breakthroughs has been the incorporation of both financial and non-financial information into data analysis systems. For example, the risk of fraud decreases if there is <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/acfi.12742">better corporate governance</a> and a lower proportion of directors who are also executives. </p>
<p>In a small business context, we can think about this as promoting transparency and making sure that important positions do not have sole authority to make significant decisions. </p>
<p>Such data analytics models can be used to rank applications in terms of potential fraud risk, so that the riskiest applications get additional scrutiny by government officials. We are now starting to see implementations of such systems to tackle <a href="https://www.theguardian.com/society/2023/jul/11/use-of-artificial-intelligence-widened-to-assess-universal-credit-applications-and-tackle">universal credit</a> fraud, for example.</p>
<p><a href="https://www.ft.com/content/0dca8946-05c8-11e8-9e12-af73e8db3c71">Banks, financial services providers</a> and <a href="https://www.ft.com/content/d3bd46cb-75d4-40ff-a0cd-6d7f33d58d7f">insurers</a> are developing machine-learning models to detect financial fraud too. A Bank of England survey published in October 2022 <a href="https://www.bankofengland.co.uk/report/2022/machine-learning-in-uk-financial-services">revealed</a> that 72% of financial services firms are already testing and implementing them. </p>
<p>We are also seeing new collaborations in the industry, with the likes of Deutsche Bank partnering with chip maker Nvidia to <a href="https://www.db.com/news/detail/20221207-deutsche-bank-partners-with-nvidia-to-embed-ai-into-financial-services">embed AI</a> into their fraud detection systems.</p>
<h2>Risks of AI systems</h2>
<p>However, the advent of new automated AI systems bring with it worries of potential unintended biases within them. In a <a href="https://www.bbc.co.uk/news/uk-politics-66133665">recent trial</a> of a new AI fraud detection system by the Department of Work and Pensions, campaign groups were worried about potential biases. </p>
<p>A common issue that needs to be overcome with such systems is that they work for the majority of people, but are often biased against minority groups. This means if left unadjusted they are disproportionately more likely to flag applications from ethnic minorities as risky.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/scams-deepfake-porn-and-romance-bots-advanced-ai-is-exciting-but-incredibly-dangerous-in-criminals-hands-199004">Scams, deepfake porn and romance bots: advanced AI is exciting, but incredibly dangerous in criminals' hands</a>
</strong>
</em>
</p>
<hr>
<p>But AI systems should not be used as a fully automated process to detect and accuse fraud but rather <a href="https://www.ft.com/content/2df33fc5-981a-4952-8dc6-d4eee7343acc">as a tool</a> to assist assessors. They can help auditors and civil servants, for example, to identify cases where greater scrutiny is required and to reduce processing time.</p><img src="https://counter.theconversation.com/content/210663/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Adrian Gepp has received funding from the Accounting and Finance Association of Australia and New Zealand. He is also affiliated with the Association of Certified Fraud Examiners. </span></em></p><p class="fine-print"><em><span>Laurence Jones does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Fraud was up 25% in the UK in 2021/22.Laurence Jones, Lecturer in Finance, Bangor UniversityAdrian Gepp, Professor of Data Analytics, Bangor UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/2029352023-04-05T12:25:12Z2023-04-05T12:25:12ZOne way to speed up clinical trials: Skip right to the data with electronic medical records<figure><img src="https://images.theconversation.com/files/519333/original/file-20230404-15-mnljji.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C2463%2C1216&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">It takes around 17 years for medical research to translate into clinical practice.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/touching-base-with-the-medical-community-royalty-free-image/1167942823">shapecharge/E+ via Getty Images</a></span></figcaption></figure><p>Scientific knowledge, as measured by numbers of papers published, has been estimated to <a href="https://doi.org/10.1057/s41599-021-00903-w">double every 17.3 years</a>. However, it takes an <a href="https://doi.org/10.1258%2Fjrsm.2011.110180">average of about 17 years</a> for health and medical research – going from basic lab studies on cell cultures and animals to clinical trials in people – to result in actual changes patients see in the clinic.</p>
<p>The typical process of medical research is generally <a href="https://theconversation.com/90-of-drugs-fail-clinical-trials-heres-one-way-researchers-can-select-better-drug-candidates-174152">not well equipped</a> to respond effectively to quickly evolving pandemics. This has been especially evident for the COVID-19 pandemic, in part because the virus the causes COVID-19 mutates frequently. Scientists and public health officials are often left <a href="https://theconversation.com/18-months-of-the-covid-19-pandemic-a-retrospective-in-7-charts-166881">continually scrambling</a> to develop and test new treatments to match emerging variants. </p>
<p>Fortunately, scientists may be able to bypass the typical research timeline and study treatments and interventions as they are used in the clinic nearly in real time by leveraging a common source of existing data – electronic medical records, or EMRs.</p>
<p>We are a team composed of an <a href="https://scholar.google.com/citations?user=0BCX1qIAAAAJ&hl=en">epidemiologist</a>, <a href="https://scholar.google.com/citations?user=LNTsvI8AAAAJ&hl=en">pharmacist</a> and <a href="https://profiles.dom.pitt.edu/card/faculty_info.aspx/Marroquin5220">cardiologist</a> at the University of Pittsburgh Medical Center. During the COVID-19 pandemic, we realized the need to quickly study and disseminate accurate information on the most effective treatment approaches, especially for patients at high risk of hospitalization and death. In our <a href="https://doi.org/10.7326/M22-1286">recently published research</a>, we used EMR data to show that early treatment with one or more of five different monoclonal antibodies substantially reduced the risk of hospitalization or death compared with delayed or no treatment. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Two surgeons reviewing medical records in front of computer screens" src="https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/519338/original/file-20230404-18-reloqo.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">EMRs contain a wealth of clinical data that could be used for research.</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/nurses-plan-surgical-paperwork-royalty-free-image/140175520">Reza Estakhrian/The Image Bank via Getty Images</a></span>
</figcaption>
</figure>
<h2>Using EMR data for research</h2>
<p>In the U.S., health care systems typically use EMR systems for documenting patient care and for administrative purposes like billing. While data collection is not uniform, these systems typically contain <a href="https://www.cms.gov/medicare/e-health/ehealthrecords">detailed records</a> that can include sociodemographic information, medical history, test results, surgical and other procedures, prescriptions and billing charges.</p>
<p>Unlike <a href="https://worldpopulationreview.com/country-rankings/countries-with-single-payer">single-payer health care systems</a> that integrate data into a single EMR system, such as in the U.K. and in Scandinavian countries, many large health care systems in the U.S. collect patient data using <a href="https://www.definitivehc.com/blog/most-common-inpatient-ehr-systems">multiple EMR systems</a>. </p>
<p>Having multiple EMR systems adds a layer of complexity to using such data to conduct scientific research. To address this, the University of Pittsburgh Medical Center developed and maintains a clinical data warehouse that compiles and harmonizes data across the seven different EMR systems its 40 hospitals and outpatient clinics use.</p>
<h2>Emulating clinical trials</h2>
<p><a href="https://doi.org/10.1007%2Fs00392-016-1025-6">Using EMR data for research</a> is not new. More recently, researchers have been looking into ways to use these large health data systems to <a href="https://doi.org/10.1093/aje/kwv254">emulate randomized controlled trials</a>, which are considered the gold standard study design yet are often costly and take years to complete.</p>
<p>Using this emulation framework, our team used the EMR data infrastructure at our institution to <a href="https://doi.org/10.7326/M22-1286">evaluate five different monoclonal antibodies</a> for which the Food and Drug Administration granted emergency use authorization to treat COVID-19. Monoclonal antibodies are human-made proteins designed to prevent a pathogen – in this case the virus that causes COVID-19 – from entering human cells, replicating and causing serious illness. Initially the authorizations were based on clinical trial data. But as the virus mutated, subsequent evaluations based on cell culture studies suggested a loss of effectiveness. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Close-up of health care provider accessing medical record on tablet" src="https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/519339/original/file-20230404-473-eyylv1.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">EMR data could be used to confirm that the results of cell culture studies would apply in the clinic.</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/photo/nurse-using-portable-computer-royalty-free-image/104509052">Solskin/DigitalVision via Getty Images</a></span>
</figcaption>
</figure>
<p>We wanted to confirm that the findings of cell-based studies applied to actual patients. So we evaluated anonymous clinical data from 2,571 patients treated with these monoclonal antibodies within two days of COVID-19 infection, matching them with data from 5,135 patients with COVID-19 who were eligible for but either did not receive these treatments or received them three or more days after infection. </p>
<p>We found that overall, people who received monoclonal antibodies within two days of a positive COVID-19 test reduced their risk of hospitalization or death by 39% compared with those who did not receive the treatment or received delayed treatment. In addition, patients with compromised immune systems reduced their risk of hospitalization or death by 55%, regardless of their age.</p>
<p>Our near-real-time analysis of COVID-19 patients treated with monoclonal antibodies during the pandemic confirmed the findings of the cell culture studies. Our findings suggest that by using data in this way, researchers may be able to evaluate treatments in times of urgency without having to perform clinical trials.</p>
<h2>Appropriate EMR data use</h2>
<p>Many health care institutions have EMR systems that researchers can harness to rapidly answer important research questions as they arise. However, because this clinical data is not specifically collected for research purposes, researchers need to <a href="https://doi.org/10.1146/annurev-publhealth-032315-021353">carefully design their studies</a> and use rigorous data validation and analysis. They also need to take great care to harmonize data from different EMR systems, select appropriate patient samples and minimize all sources of potential bias. </p>
<p>New pandemics and significant public health challenges are likely to emerge abruptly and in unpredictable ways. Given the treasure trove of data routinely collected across U.S. health care systems, we believe that careful use of these data can help answer urgent health questions in ways that are representative of who’s actually receiving care.</p><img src="https://counter.theconversation.com/content/202935/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Erin McCreary has served on scientific advisory boards for Shionogi, Inc and Merck.</span></em></p><p class="fine-print"><em><span>Kevin Kip and Oscar Marroquin do not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>In health care crises, researchers can avoid waiting for clinical trial results by using data from health care systems to analyze the effectiveness of treatments for COVID-19 and other illnesses.Kevin Kip, Vice President of Clinical Analytics, University of PittsburghErin McCreary, Clinical Assistant Professor of Medicine, University of PittsburghOscar Marroquin, Associate Professor of Medicine, Epidemiology and Clinical and Translational Science, University of PittsburghLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1978652023-02-09T13:34:03Z2023-02-09T13:34:03ZData from New Jersey is a warning sign for young sports bettors<figure><img src="https://images.theconversation.com/files/508402/original/file-20230206-27-cp4rbb.jpg?ixlib=rb-1.1.0&rect=41%2C53%2C3593%2C2488&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Fans celebrate at the William Hill Sports Book in Atlantic City, N.J.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/news-photo/fans-gather-at-william-hill-sports-book-at-ocean-resort-news-photo/1127223046?phrase=sports%20book%20new%20jersey&adppopup=true">Lisa Lake/Getty Images for William Hill US</a></span></figcaption></figure><p>When the Philadelphia Eagles and Kansas City Chiefs take the field for Super Bowl LVII, a record-breaking 50 million bettors are expected to have <a href="https://www.americangaming.org/new/record-50-million-americans-to-wager-16b-on-super-bowl-lvii/">US$16 billion</a> of their own skin in the game, according to the American Gaming Association. </p>
<p>In January 2023, Ohio and Massachusetts launched legal sports betting, joining Washington D.C. and <a href="https://www.americangaming.org/research/state-gaming-map/">34 other states</a> that have passed laws since the Supreme Court overturned a federal ban in 2018. State legislatures have generally been eager to capitalize on the tax windfalls from sports betting and get their slice of <a href="https://www.americangaming.org/resources/aga-commercial-gaming-revenue-tracker/">the billions</a> wagered annually. Voters are also <a href="https://www.washingtonpost.com/sports/2022/07/08/legal-sports-betting-support-americans/">increasingly supportive of legalization</a>. </p>
<p>Here in New Jersey, sports betting, both online and in person, has been legal since June 2018. The state is the only jurisdiction that requires yearly evaluations of the relationship of online gambling and sports wagering to problem gambling. </p>
<p>The Center for Gambling Studies at Rutgers University, which I direct, <a href="https://socialwork.rutgers.edu/centers/center-gambling-studies/research-publications">conducts those annual evaluations</a> using data from all sports bets placed in New Jersey since 2018. Our findings suggest that the nation’s love affair with sports betting may be having unintended consequences.</p>
<h2>Sports betting tied to poor mental health</h2>
<p>In a forthcoming statewide gambling prevalence study, we found that those wagering on sports in New Jersey were more likely than others who gamble to have high rates of problem gambling and problems with drugs or alcohol, and to experience mental health problems like anxiety and depression. Most alarming, findings suggest that about 14% of sports bettors reported thoughts of suicide, and 10% said they had made a suicide attempt.</p>
<p>A small group of bettors seem to be most at risk. About 5% of all sports bettors placed nearly half of all bets and spent nearly 70% of the money. That means the people losing the most money are the most essential to operator profits.</p>
<p><a href="https://socialwork.rutgers.edu/centers/center-gambling-studies/research-publications">The fastest-growing group of sports bettors in New Jersey</a> are young adults, ages 21 to 24. Most have placed in-game bets, and about 19% spent half of their money betting during games, <a href="https://theconversation.com/sports-betting-apps-notifications-and-leaderboards-encourage-more-and-more-wagers-a-psychologist-who-treats-gambling-addictions-explains-why-some-people-get-hooked-198358">when emotions and impulsive spending are highest</a>. </p>
<p>Although regulators require operators to allow bettors to set limits – on losses, deposits or time spent gambling – only about 1% of young bettors use any of the safeguards, less than any other age group. Since about <a href="https://socialwork.rutgers.edu/centers/center-gambling-studies/research-publications">70% of the sports bets we analyzed</a> were losing bets, most of these young players could find themselves losing more money than they can afford. </p>
<h2>A vulnerable population</h2>
<p>It is possible, then, that states could unwittingly be introducing a cohort of young people to problem gambling and a lifetime of negative consequences. </p>
<p>That’s because the younger that people <a href="https://doi.org/10.1016%2Fj.jpsychires.2012.02.007">start gambling</a>, the more activities they bet on. And the more frequently they bet, the more likely they are to develop serious gambling problems. Studies suggest that <a href="https://doi.org/10.1007/s10899-017-9726-y">those who gamble as young adults</a> have higher-than-average rates of problem gambling.</p>
<p>The danger is compounded by the easy access afforded by tablets and mobile phones, which eliminate most barriers to gambling even for those who are underage. Children who are exposed to the unrelenting parade of gambling ads <a href="https://doi.org/10.1111/1753-6405.12728">report they remember</a> both the products and the betting terms from those ads, and some teens say <a href="https://doi.org/10.1556/2006.7.2018.128">they intended to gamble as a result</a>. If <a href="https://doi.org/10.1016/j.addbeh.2022.107460">parents or other household members also gamble</a>, those children may later develop not only gambling problems, but also problems with drugs and alcohol. </p>
<h2>Few regulatory measures in place</h2>
<p>In the U.S., the Marlboro Man can no longer gallop across <a href="https://www.ftc.gov/legal-library/browse/statutes/federal-cigarette-labeling-advertising-act">the nation’s television airwaves</a>. Alcohol ads <a href="https://alcohol.org/laws/marketing-to-the-public/">can’t contain</a> statements that are misleading, patently false or target those who are underage.</p>
<p>However, there are currently no such federal guidelines for gambling ads. Major League Baseball, which banned Pete Rose and locked him out of the Hall of Fame for gambling, openly sanctions <a href="https://www.forbes.com/sites/maurybrown/2021/08/10/why-nearly-all-mlb-ballparks-will-have-a-sportsbook-attached-to-it-in-the-future/?sh=52ba50cb36d8">sports books attached to stadiums</a> and <a href="https://www.forbes.com/sites/christianred/2021/01/12/major-league-baseball-teams-and-a-new-revenue-stream-online-gaming-business-partners/?sh=a0866755ef95">partnerships with gambling operators</a>. The same goes for the NFL and most of its teams, with former stars like Eli Manning <a href="https://giantswire.usatoday.com/2022/08/23/see-it-new-york-giants-legend-eli-manning-appears-new-caesars-ad-with-brothers/">encouraging betting</a> in ads and Pro Bowl wide receiver Davonte Adams becoming the <a href="https://www.actionnetwork.com/news/davante-adams-likely-first-active-nfl-player-with-gambling-related-sponsor">first active player</a> with a gambling sponsor.</p>
<figure class="align-center ">
<img alt="Man holding betting slip." src="https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/508401/original/file-20230206-29-rmkopb.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">A man holds a betting slip on the first day of legal sports betting in New Jersey on June 14, 2018.</span>
<span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/news-photo/professional-odds-maker-stu-feiner-holds-up-a-betting-slip-news-photo/974402508?phrase=sports%20gambling%20new%20jersey&adppopup=true">Dominick Reuter/AFP via Getty Images</a></span>
</figcaption>
</figure>
<p>Those who recognize they have a gambling problem also have no assurances that they can find help. </p>
<p>Gambling treatment services <a href="https://naadgs.org/wp-content/uploads/2022/06/NAADGS_2021_Survey_of_Publicly_Funded_Problem_Gambling_Services_in_the_United_States_v2.pdf">vary by state</a>, from specially trained, culturally competent counselors in a few states to a total lack of services in others. Most children and teens receive no education in schools about problem gambling as they do for drugs and alcohol. Some universities <a href="https://www.nytimes.com/2022/11/20/business/caesars-sports-betting-universities-colleges.html">are openly partnering with gambling companies</a> and sponsoring esports competitions, which invite underage betting.</p>
<p>The federal government is noticeably silent on a glamorized addiction. Nationally, there are no federal policies, prohibitions or federally funded research or <a href="https://int.nyt.com/data/documenttools/naadgs-analysis-of-problem-gambling-funding-july-2022/521f7652c06a6d4d/full.pdf">prevention programs</a>, despite all the revenue generated by taxes on gambling winnings.</p>
<p>Internationally, gambling-related abuses and tragedies have led countries <a href="https://theconversation.com/40-years-of-legal-sports-betting-in-australia-points-to-risks-for-us-gamblers-and-tips-for-regulators-194993">like Australia and the U.K.</a> to enact new regulations and significant penalties for operators. The U.K., for example, <a href="https://doi.org/10.1089/glr2.2022.0020">requires operators to conduct affordability checks</a> on patrons to ensure they can afford their losses and prohibits gambling advertising by athletes, celebrities or social media influencers who appeal to children and teens.</p>
<p>I think it’s only a matter of time before similar proposals make their way to the U.S. In the meantime, however, millions of people in more than half the country will legally lay their hard-earned money on the line for a chance to win big on Sunday.</p>
<p>Hopefully, they can afford to lose.</p><img src="https://counter.theconversation.com/content/197865/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Lia Nower has been a member of advisory boards, and has conducted research and grant reviews for U.S. and international governments, government-related agencies, private firms, and industry operators. These include New Jersey's Division of Gaming Enforcement & Division of Mental Health and Addiction Services, Ohio's Department of Mental Health and Addiction, Camelot (United Kingdom), Crown Casino (Australia), the British Columbia Lottery Corporation (Canada), Churchill Downs (U.S.), Aristocrat Leisure (Australia), the New York Council on Problem Gambling, Publiedit (Italy) and the National Council on Problem Gambling (U.S.).</span></em></p>Researchers who analyzed every sports bet placed online since 2018 found that young adults are the fastest-growing group of bettors, with more than 70% of them placing in-game bets.Lia Nower, Professor and Director, Center for Gambling Studies, Rutgers UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1946012022-11-18T16:05:50Z2022-11-18T16:05:50ZWhat the world would lose with the demise of Twitter: Valuable eyewitness accounts and raw data on human behavior, as well as a habitat for trolls<figure><img src="https://images.theconversation.com/files/495986/original/file-20221117-15-s4q1at.jpg?ixlib=rb-1.1.0&rect=0%2C0%2C6000%2C3997&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Twitter itself produces a lot of data that's available nowhere else.</span> <span class="attribution"><a class="source" href="https://www.gettyimages.com/detail/news-photo/the-twitter-logo-is-seen-in-this-photo-illustration-in-news-photo/1244760225">STR/NurPhoto via Getty Images</a></span></figcaption></figure><p>What do a cybersecurity researcher building a system to generate alerts for detecting <a href="https://ieeexplore.ieee.org/abstract/document/7752338">security threats and vulnerabilities</a>, a wildfire watcher who <a href="https://www.wired.com/story/california-fire-twitter/">tracks the spread of forest fires</a>, and public health professionals <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5308155/">trying to predict enrollment</a> in health insurance exchanges have in common? </p>
<p>They all rely on analyzing data from Twitter. </p>
<p>Twitter is a microblogging service, meaning it’s designed for sharing posts of short segments of text and embedded audio and video clips. The ease with which people can share information among millions of others worldwide on Twitter has made it very popular for real-time conversations. Whether it is people tweeting about their favorite sports teams, or organizations and public figures using Twitter to reach a mass audience, Twitter has been part of the collective record for over a decade. </p>
<p>The Twitter <a href="https://blog.twitter.com/en_us/a/2015/full-archive-search-api">archives allow for instant and complete access</a> to every public tweet, which has positioned Twitter both as a archive of collective human behavior and as a credentialing and fact-checking service on a global scale. As a <a href="https://scholar.google.com/citations?user=JpFHYKcAAAAJ&hl=en">researcher who studies social media</a>, I believe that these functions are very valuable for academics, policymakers and anyone using aggregate data to obtain insights into human behavior. </p>
<p>The proliferation of <a href="https://www.wired.com/story/twitter-blue-check-verification-buy-scams/">scams and brand impersonators</a>, the <a href="https://www.washingtonpost.com/technology/2022/11/14/twitter-fake-eli-lilly/">hemorrhaging of advertisers</a>, and <a href="https://www.nytimes.com/2022/11/17/technology/twitter-elon-musk-ftc.html">disarray within the company</a> call the future of the platform <a href="https://mashable.com/article/elon-musk-twitter-future">into question</a>. If Twitter were to go under, the loss would reverberate around the world. </p>
<h2>Analyzing human behavior</h2>
<p>With its massive trove of tweets, Twitter has provided new ways to quantify public discourse and new tools to map aggregate perceptions, and offers a window into large-scale human behavior. Such <a href="https://library.oapen.org/viewer/web/viewer.html?file=/bitstream/handle/20.500.12657/51412/9781003024583_10.4324_9781003024583-8.pdf">digital traces</a> or records of human activity allow researchers in fields ranging from social sciences to healthcare to analyze a variety of phenomena. </p>
<p>From open source intelligence to citizen science, Twitter has not only been a digital public square, but has also allowed researchers to infer attitudes that are difficult to detect through methods from traditional field research. For example, people’s willingness to pay for policies and services that address climate change has traditionally been measured through surveys of subjective well-being. Twitter sentiment data gives researchers and policymakers <a href="https://doi.org/10.1016/j.jpubeco.2020.104161">another tool for assessing these attitudes</a> in order to take more meaningful action on climate change. </p>
<p>Researchers in public health have found an association between tweeting about <a href="https://doi.org/10.2196%2F17196">HIV and incidence of HIV</a>, and have been able to <a href="https://doi.org/10.1371/journal.pone.0219550">measure sentiment at the neighborhood level</a> to assess the overall health of the people in those neighborhoods. </p>
<h2>Place and time</h2>
<p>Geotagged data from Twitter helps in a variety of fields such as <a href="https://doi.org/10.1371/journal.pone.0131469">urban land use</a> and <a href="https://doi.org/10.1080/24694452.2017.1421897">disaster resilience</a>. Being able to identify the locations for a set of tweets allows researchers to correlate information in the tweets with times and places – for example, correlating tweets and ZIP codes to <a href="https://theconversation.com/matching-tweets-to-zip-codes-can-spotlight-hot-spots-of-covid-19-vaccine-hesitancy-169596">identify hot spots of vaccine hesitancy</a>.</p>
<p>Twitter has been invaluable in the field of <a href="https://doi.org/10.1109/ICCCN49398.2020.9209602">open source intelligence</a> (OSINT), particularly for tracking down war crimes. OSINT uses crowdsourcing to identify the locations of photos and videos. In Ukraine, <a href="https://www.grid.news/story/global/2022/04/11/in-ukraine-war-crimes-are-being-captured-on-social-media/">human rights investigators have focused on using Twitter and TikTok</a> to search for evidence of abuses. </p>
<p>Open source intelligence has also been helpful for cutting through the fog of war. For example, OSINT analysts were quick to provide evidence that the missile that exploded in Przewodow, Poland near the Ukrainian border on Nov. 15, 2022 was likely an S-300 antiaircraft missile and unlikely a ballistic or cruise missile fired by Russia.</p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"1592629251161075712"}"></div></p>
<h2>Credentialing and verification</h2>
<p>Although misinformation has been <a href="https://doi.org/10.1126/science.aap9559">disseminated far and wide on Twitter</a>, the platform also serves a role as a global verification mechanism. First, vast numbers of people use Twitter and other social media platforms. With crowdsourcing writ large, social media assumes the role of an authoritative information provider, reducing some of the uncertainty people face in searching for new information. The platforms perform a credentialing role that some scholars refer to as “<a href="https://www.google.com/books/edition/Media_Technologies/zeK2AgAAQBAJ">public relevance algorithms</a>,” in that they have replaced dedicated business or technical expertise in identifying what people need to know. </p>
<p>Another way has been official credentialing. Prior to Elon Musk’s takeover, Twitter’s verification method provided public figures with a blue check mark on their profiles, which served as a shortcut in establishing whether a source of a tweet <a href="https://www.politico.com/news/2022/11/13/washington-gets-increasingly-freaked-out-by-twitter-00066647">was who the person claimed to be</a>. </p>
<p>While problems such as <a href="https://doi.org/10.1126/science.aau2706">fake news</a>, <a href="https://doi.org/10.2105/AJPH.2020.305940">misinformation</a> and <a href="https://aclanthology.org/W18-5110/">hate speech</a> exist, the credentialing ability coupled with the vast number of people who use the platform in real time made Twitter a provider of credible information and a <a href="https://doi.org/10.1177/1940161214540942">fact-checker</a>. </p>
<h2>The digital public square</h2>
<p>Twitter’s dual role in fostering real-time communication and acting as an arbitrator of authoritative information is of crucial interest to academics, journalists and government agencies. During the pandemic, for example, many public health <a href="https://doi.org/10.2196/24883">agencies turned to Twitter</a> to promote behavior that mitigates the risk of infection. </p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"1592676469305905152"}"></div></p>
<p>During disasters and emergencies, Twitter has been a great venue for <a href="https://doi.org/10.1016/j.ipm.2019.102107">crowdsourced eyewitness data</a>. During Hurricane Harvey, for example, researchers found that that users responded and interacted the <a href="https://doi.org/10.1007/s11069-020-04016-6">most with tweets from verified Twitter accounts</a>, and especially from government organizations. Official Twitter accounts helped in the rapid dissemination of information during <a href="https://doi.org/10.1016/j.chb.2015.06.044">a water contamination crisis</a> in West Virginia. Twitter data has also helped in <a href="https://doi.org/10.1016/j.trc.2021.102976">hurricane evacuations</a>. </p>
<p>Twitter has also been an important way for people with disabilities to participate in public discourse.</p>
<p><div data-react-class="Tweet" data-react-props="{"tweetId":"1593334136969666560"}"></div></p>
<p>Twitter’s real value has been in enabling people to connect with each other in real time and as an archive of collective behavior. Recognizing this, <a href="https://qz.com/1143475/the-un-is-the-international-organization-with-the-most-followers-on-twitter">international organizations</a>, <a href="https://fcw.com/workforce/2012/09/the-50-most-followed-agencies-on-twitter/206197/">government agencies</a> and <a href="https://icma.org/articles/article/top-local-government-twitter-users">local governments</a> have invested significant resources in using Twitter and have come to rely on the platform. Sen. Edward Markey has described Twitter as “<a href="https://www.politico.com/amp/news/2022/11/17/ed-markey-deep-dive-00069221">essential” to American society</a>. If Twitter were to collapse, there’s no clear replacement in sight.</p><img src="https://counter.theconversation.com/content/194601/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Anjana Susarla does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>If Twitter were to go dark, with it would go a valuable source of data as well as a means of sharing information relied on by activists, journalists, public health officials and scientists.Anjana Susarla, Professor of Information Systems, Michigan State UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1768912022-03-07T19:06:45Z2022-03-07T19:06:45ZHow we communicate, what we value – even who we are: 8 surprising things data science has revealed about us over the past decade<figure><img src="https://images.theconversation.com/files/450290/original/file-20220307-84591-em3wfo.jpeg?ixlib=rb-1.1.0&rect=249%2C62%2C6674%2C4556&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">Shutterstock</span></span></figcaption></figure><p>Big data analysis has long supported <a href="https://www.newscientist.com/article/mg24432613-200-new-scientist-ranks-the-top-10-discoveries-of-the-decade/">major feats</a> in physics and astronomy. But more recently we’ve seen it underpin breakthroughs in the social sciences and humanities.</p>
<p>Since the landmark paper <a href="https://www.science.org/doi/10.1126/science.1167742">Computational Social Science</a> was published in 2009, a new generation of data analytics tools has given researchers insight into fundamental questions about how we communicate, who we are and what we value. </p>
<p>For instance, by analysing the relative frequency of certain words in historical texts, researchers can identify important changes in our use of language over time.</p>
<p>In some cases these shifts will be obvious, such as the use of archaic words being replaced by more contemporary words. But in other cases, they may reflect more subtle but widespread social and cultural changes. Below are some of the most influential data-centric discoveries from the past 10 years.</p>
<h2>How we communicate</h2>
<p>Over the past decade, a growing number of global open data sources have helped researchers reveal patterns in what we read, write and pay attention to. Google Books, <a href="https://www.worldcat.org/">Worldcat</a> and <a href="https://www.gutenberg.org/">Project Gutenberg</a> are just some examples.</p>
<p>The release of the Google Books <a href="https://books.google.com/ngrams">n-gram viewer</a> in the early 2010s was a game changer on this front. Using the entire Google Books database, this tool shows you the relative frequency of a specific term or phrase as it has been used over hundreds of years. <a href="https://www.scientificamerican.com/article/google-books-culture/">Researchers</a> have used this data to explore the systematic suppression of the mention of Jewish painters, such as Marc Chagall, in German books during World War II.</p>
<p>Data analysis can also reveal patterns in the expression of human emotions over time. CSIRO’s <a href="http://wefeel.csiro.au/">We Feel</a> tracks emotions in communities around the world. It does this by analysing the language people are using on social media in real time and mapping it out. </p>
<p>The tool can be used to determine the general mood over time (hour by hour, day by day) within particular cities and countries. Patterns in these data can then be explored in association with other information, such as weather, holidays and economic fluctuations. </p>
<p>Some research findings even claim to represent fundamental changes in humans’ social values, community sentiment and how we think (for example, the rise and fall of words associated with rationality such as “method”, “analysis” and “determine”).</p>
<p>Here are some key findings in this space:
</p><ul>
<li> <strong>Cultural turnover is accelerating</strong> <p></p>
<p>A Harvard University-led <a href="https://www.science.org/doi/10.1126/science.1199644">analysis</a> of more than a century of data from millions of books provides evidence that society’s attention span for historical events is declining, as appetite for new material grows. </p>
<p>In other words, we are forgetting the past faster. You can see this in the graph below, which tracks how often three specific years are mentioned across a vast range of literature through time. As time passes, the “half-life” of each year (the point at which it receives just half the attention it had at its peak) comes quicker.</p>
<figure class="align-center ">
<img alt="Counts of mentions of the years 1883, 1910 and 1950 in all books for the past 200 years." src="https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=395&fit=crop&dpr=1 600w, https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=395&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=395&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=496&fit=crop&dpr=1 754w, https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=496&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/449139/original/file-20220301-15-1kjxrze.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=496&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Our collective attention for historical events has shrunk over the past century.</span>
<span class="attribution"><a class="source" href="https://www.science.org/doi/10.1126/science.1199644">Michel et al., Science 2010</a></span>
</figcaption>
</figure>
<p></p></li><li> <strong>Human language diversity and biodiversity are correlated</strong><p></p>
<p>By mapping linguistic diversity and the diversity of animal species, researchers have <a href="https://doi.org/10.1038/s41598-020-76658-2">shown</a> these two worlds are correlated geographically – both increasing with temperature and proximity to the equator. So the closer to the equator you get, the more variation there is in spoken language and the greater the variety of species there is. </p>
<p>The authors propose this is due to heat near the equator producing greater productivity and variety in plant life, which in turn provides more complex and interactive environments for both animals and humans alike – feeding into a cycle whereby “diversity begets more diversity”.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Three figures showing diversity distributions of language and animals and their relation to geography." src="https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=503&fit=crop&dpr=1 600w, https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=503&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=503&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=633&fit=crop&dpr=1 754w, https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=633&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/449991/original/file-20220304-36214-pyfb2r.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=633&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Researchers have shown both linguistic diversity and species diversity increase exponentially with temperature and proximity to the equator.</span>
<span class="attribution"><a class="source" href="https://www.nature.com/articles/s41598-020-76658-2">Hamilton, Walker & Kempes, Scientific Reports 2020</a></span>
</figcaption>
</figure>
<p></p></li><li><strong>There have been society-wide shifts in language use over the past century</strong><p></p>
<p>In an article <a href="https://www.pnas.org/doi/10.1073/pnas.2107848118">published</a> in December researchers used machine learning to show long-term, consistent changes in our use of language. Specifically, they reveal an inflection point in the 1980s where there is a shift towards more egocentric, emotional and supposedly less rational language.</p>
<p>The authors suggest (although not <a href="https://doi.org/10.1073/pnas.2121300119">without contest</a>) this could signal the beginning of a “post-truth era”.
</p></li></ul><p></p>
<h2>Who we are</h2>
<p>In the field of psychology, the same data analytics tools have shown that people’s personalities can be measured using the “Big 5” traits, which largely become <a href="https://doi.org/10.1016/j.econlet.2011.11.015">stable in adulthood</a>.</p>
<p>This was possible thanks to extensive data sets such as HILDA in Australia, the German Socio-Economic Panel in Germany and the British Household Panel Survey in the UK. </p>
<p>Robust studies have also demonstrated that personality traits can be reliably and accurately predicted from a variety of data sources including <a href="https://doi.org/10.1027/1614-0001/a000362">voice recordings</a>, <a href="https://doi.org/10.1073/pnas.1920484117">mobile phone usage patterns</a> and even <a href="https://www.nature.com/articles/s41598-020-79310-1">portrait photographs</a>. </p>
<p>In turn, there have been some remarkable associations found at scale between personality and:
</p><ul>
<li> <strong>Elevation</strong><p></p>
<p>A study published in 2020, and based on more than three million people’s data, <a href="https://www.nature.com/articles/s41562-020-0930-x">shows</a> mountain-dwelling people tend to have different personality traits than those who live at sea level. They are generally more open to new experiences and more emotionally stable.</p>
<p></p></li><li> <strong>Location</strong> <p></p>
<p>Another earlier study shows people who live in the United States can be divided into <a href="https://www.apa.org/pubs/journals/releases/psp-a0034434.pdf">three clear and measurable clusters</a> of personality types, linked with associated geographic footprints. New Yorkers and Texans (who are in the same cluster) are more likely to be temperamental and uninhibited.</p>
<p></p></li><li> <strong>Occupation</strong><p></p>
<p>In our own research published with colleagues in 2019, we analysed the personality features of people in more than 1,000 different occupations. <a href="https://www.pnas.org/doi/10.1073/pnas.1917942116">We found</a> people in the same role share similar traits. Scientists are more open to new ideas yet <a href="https://www.natureindex.com/news-blog/scientists-are-curious-and-idealistic-but-not-very-agreeable-compared-to-other-professions">ready to argue</a>, whereas tennis professionals tend to be friendly and outgoing. </p>
<p>The research used machine learning to infer the personality features of more than 100,000 people, based on language used on social media.</p>
<p></p></li></ul> <p></p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/robot-career-advisor-ai-may-soon-be-able-to-analyse-your-tweets-to-match-you-to-a-job-128777">Robot career advisor: AI may soon be able to analyse your tweets to match you to a job</a>
</strong>
</em>
</p>
<hr>
<h2>What we value</h2>
<p>In economics, we’re seeing major research frontiers being opened up thanks to data analysis, including in:
</p><ul>
<li> <strong>Network science</strong> <p></p>
<p>When it comes to success, we’ve learnt that performance matters most when it can be measured (like in sport). But in other fields where it can’t be measured easily (like in the art world), networks <a href="https://doi.org/10.1140/epjds/s13688-020-00227-w">matter</a> <a href="https://www.science.org/doi/10.1126/science.aau7224">most</a>. </p>
<p></p></li><li> <strong>Behavioural economics</strong> <p></p>
<p>We can now see how we behave as individuals <em>en masse</em>, unveiling valuable clues for effective policy interventions around employment, taxation and education. For instance, one <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128692">large-scale study</a> revealed those quickest to re-enter the workforce displayed certain key behaviours. These included being an early riser and being geographically mobile (perhaps meaning they’re more willing to travel further, or relocate, for work).
</p></li></ul> <p></p>
<h2>Post-theory science?</h2>
<p>Some have argued data science poses a fundamental challenge to the traditional sciences, with the emergence of “<a href="https://www.theguardian.com/technology/2022/jan/09/are-we-witnessing-the-dawn-of-post-theory-science">post-theory science</a>”. This is the concept that machines are better at understanding the relationship between data and reality than the traditional scientific method of <em>hypothesise, predict and test</em>. </p>
<p>However, reports of the <a href="https://www.wired.com/2008/06/pb-theory/">death of theory</a> are perhaps greatly exaggerated. Data are not perfect. And data science based on incomplete or biased data has the potential to miss, or mask, important patterns in human activity. This can only be addressed by critical thinking and theory. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/nobel-economics-prize-winners-showed-economists-how-to-turn-the-real-world-into-their-laboratory-169697">Nobel economics prize winners showed economists how to turn the real world into their laboratory</a>
</strong>
</em>
</p>
<hr>
<img src="https://counter.theconversation.com/content/176891/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Big data analysis has unveiled startling links between seemingly unrelated things, such as how a person’s physical elevation above sea level might influence their personality.Paul X. McCarthy, Adjunct Professor, UNSW SydneyColin Griffith, Strategy & Business Development, Data61, CSIROLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1715152021-12-02T14:32:26Z2021-12-02T14:32:26ZSouth Africa is failing to ride the digital revolution wave. What it needs to do<figure><img src="https://images.theconversation.com/files/435067/original/file-20211201-21-iepod4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">South African fruit producers use digital technologies such as blockchain and radio frequency identification tags.</span> <span class="attribution"><span class="source">Getty Images</span></span></figcaption></figure><p>Workplaces are adopting new forms of advanced automation at a rate that suggests a digital revolution in the making. </p>
<p>Digital technologies such as sensorisation, networked data analytics, and artificial intelligence make it possible to collect data along the entire chain of production and consumption activities. They also enable the data to be used for a host of other purposes. These include shaping markets and industries, offering benefits like reducing production costs and time to market, and increasing product and service quality.</p>
<p>But it is a revolution that is playing out unevenly, across and within countries. This has consequences for the competitiveness, inclusiveness and sustainability of economies.</p>
<p>Countries have varying capacities to optimally harness and integrate these digitalised technologies. Preconditions for uptake include, firstly, reliable enabling infrastructures. This includes connectivity and energy. Secondly, there need to be foundational capabilities, such as digital skills.</p>
<p>Middle-income countries such as South Africa are finding these conditions tough to meet. This is because they have been affected by premature de-industrialisation – a lack of diversification and relative shrinking of their production structure.</p>
<p><a href="https://oxford.universitypressscholarship.com/view/10.1093/oso/9780192894311.001.0001/oso-9780192894311-chapter-12">Our research</a> shows that South Africa’s adoption and diffusion of digital technologies has been slow and uneven. The research, and an ongoing digital survey, sheds light on the patterns of adoption and the factors influencing them.</p>
<p>We surveyed 516 firms in three key sectors. These were manufacturing, engineering and related services, the chemicals industry and the fibre processing and manufacturing sector.</p>
<p>South Africa has structural constraints that have limited the development and diffusion of skills and capabilities beyond some scattered islands of innovation. The constraints include limited productive diversification, its energy system and a high rate of unemployment. </p>
<p>But the country has potential for more rapid progress. First, it needs to take some crucial steps. It needs better resourced training institutions and more digital upskilling. It needs a more enabling infrastructure such as high speed broadband. It also needs a coherent digital industrial policy framework that enables a new industrial ecosystem.</p>
<h2>What the research shows</h2>
<p><a href="https://oxford.universitypressscholarship.com/view/10.1093/oso/9780192894311.001.0001/oso-9780192894311-chapter-12">Our research</a> highlights the potential of advanced digital technologies to drive socially inclusive and environmentally sustainable industrial development. We also identified pockets of the South African economy where this is happening.</p>
<p>Some mining firms, for example, have made processes more efficient and reduced waste. Predictive maintenance uses data analysis to identify problems before they cause production failures.</p>
<p>In the metal-machinery value chains, some foundries are using artificial intelligence to predict sub-surface defects and reduce internal scrap and rework rates. This has had a hugely positive impact on three fronts: reduced energy usage, the environment and competitiveness.</p>
<p>In agriculture, some producers are using digital technologies such as blockchain and radio frequency identification tags. These give farms and their “cold chains” a competitive advantage in the export of high-value fresh fruits. This “industrialisation of freshness” is making it possible for intensive agriculture to expand. The result is more and better jobs.</p>
<p>But the effective use of digital technologies requires reliable digital infrastructure. And firms need skills at all levels. These include programming, web and application development, digital design, data management, visualisation and analytics. Analytics need a strong foundation in literacy, numeracy, and information and communication technologies.</p>
<p>Firms also need the ability to coordinate technological change along chains of activities. </p>
<p>To identify what would enable progress in digital industrialisation, we assessed digital readiness and adoption patterns at both sector and firm level. We found a mixed bag.</p>
<h2>A mixed bag</h2>
<p>Many of the firms we sampled still use of manual and semi-automated technologies. A smaller group are fully automated and ICT enabled. The full adoption of advanced digital technologies remains very limited. </p>
<p>This was the case in four key business functions: supplier relationships, product development, production management and customer-client relationships.</p>
<p>The picture was mixed, too, across the three sectors that we sampled. Firms in the manufacturing, engineering and related services sector are more ready than others. They have adopted digital system-enabled technologies in the four business functions. Not so for firms in the fibre processing and manufacturing and the chemical industries.</p>
<p>The limited adoption of technologies – and the slow and uneven diffusion – suggest that the process of change will be challenging. But firms expressed the intention to start using more technologies in their production and other structures. They also intend investing in advanced digital technologies.</p>
<h2>Policy framework</h2>
<p>A broader digital industrial policy framework would help South Africa accelerate digital industrialisation and get training institutions better resourced and aligned</p>
<p>Our survey suggests it should have six related priorities:</p>
<ul>
<li><p>improved cost, speed, and reliability of ICT infrastructure (bandwidth)</p></li>
<li><p>digital skills policy</p></li>
<li><p>digital technology policy</p></li>
<li><p>financing and investment</p></li>
<li><p>linkages to development policy</p></li>
<li><p>economic regulation, competition policy and data.</p></li>
</ul>
<p>It should aim to shape a new industrial ecosystem that allows participants to seize opportunities. Key to this is enhancing government capacity to implement and enforce industrial policy. And ensuring more effective cooperation with the private sector. </p>
<p>Overall, digital industrialisation will raise potential trade-offs and new conflicts in the economy. These span issues such employment and new skills requirements as well as need for industrial restructuring. There is also a concern that digital technologies will increase the existing divide between large and small firms. In turn, this will reinforce existing concentration in the economy. That would be bad for re-industrialisation. </p>
<p>A digital industrial policy must therefore ensure that benefits are distributed across different types of firms, their employees, and broader society.</p>
<p>This challenge is certainly not unique to South Africa. Other middle-income economies are facing the same difficulties. They also face the need to work out how to incorporate digital disruption within their existing policy instruments.</p>
<p><em>The research on which this article is based was conducted through the <a href="https://www.competition.org.za/idtt">Industrial Development Think Tank</a> at the University of Johannesburg. The research was later incorporated into a book, <a href="https://global.oup.com/academic/product/structural-transformation-in-south-africa-9780192894311?cc=us&lang=en&">Structural Transformation in South Africa: The Challenges of Inclusive Industrial Development in a Middle-Income Country</a></em></p><img src="https://counter.theconversation.com/content/171515/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Antonio Andreoni received funding from the Department for International Development (DFID), African Climate Foundation and Gatsby Africa.</span></em></p><p class="fine-print"><em><span>Elvis Avenyo is affiliated with the Centre for Competition, Regulation and Economic Development (CCRED) and DST/NRF South African Research Chair in Industrial Development, University of Johannesburg, Johannesburg, South Africa. </span></em></p>To take full advantage of digital technologies South Africa needs a coherent digital industrial policy.Antonio Andreoni, Associate Professor of Industrial Economics, UCL Institute for Innovation and Public Purpose and Visiting Associate Professor, SARChI Industrial Development, University of JohannesburgElvis Avenyo, Senior Researcher, Centre for Competition, Regulation and Economic Development, University of JohannesburgLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1500872020-11-23T05:58:28Z2020-11-23T05:58:28ZProsecuting within complex criminal networks is hard. Data analysis could save the courts precious time and money<figure><img src="https://images.theconversation.com/files/370733/original/file-20201123-13-9zlawt.jpg?ixlib=rb-1.1.0&rect=60%2C69%2C5705%2C3769&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">
</span> <span class="attribution"><span class="source">Shutterstock</span></span></figcaption></figure><p>It’s no secret the trail of data we leave online can reveal intimate details about our lives. And there are myriad people whose job it is to collect and sift through this, often with a goal to <a href="https://theconversation.com/how-the-shady-world-of-the-data-industry-strips-away-our-freedoms-143823">engage in targeted advertising</a>. </p>
<p>Another use for the field of “social network analysis” could eventually be to help prosecutors in criminal courts make sense of huge amounts of digital evidence, collected both online and from devices offline. </p>
<p>This would be particularly useful in trials with several defendants, saving courts precious <a href="http://www.unicri.it/sites/default/files/2020-08/Artificial%20Intelligence%20Collection.pdf">time and money</a>. Criminal networks can use online spaces such as <a href="https://www.wired.com/2016/01/the-silk-roads-dark-web-dream-is-dead/">dark web marketplaces</a> to organise crime and <a href="https://theconversation.com/dark-web-not-dark-alley-why-drug-sellers-see-the-internet-as-a-lucrative-safe-haven-132579">reach more victims</a> and clients. </p>
<p>Transaction patterns, messages and page visits are all clues that can help unpack such a network.</p>
<h2>What it is and how does it work?</h2>
<p>Social network analysis involves using <a href="https://forensiclogic.com/coplinkx/">advanced computer software</a> to explore segments of patterns that recur in social interactions, online and offline. It offers scholars a broad <a href="https://mis.csit.sci.tsu.ac.th/siraya/wp-content/uploads/2015/09/1Social-Network-Analysis-An-Introduction-1.pdf">perspective on the world</a> of human relations. </p>
<p>This form of analysis doesn’t just look at who you’re friends with on Instagram – it looks at which decisions you make as an individual, which you make in a group and how these layers of choices influence your world.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/your-social-networks-and-the-secret-story-of-metadata-16119">Your social networks and the secret story of metadata</a>
</strong>
</em>
</p>
<hr>
<p>In its simplest form, these social networks can be presented in graphs. There are “nodes” (which represent people) connected by lines or “edges”. An edge could represent a phone call, message or meeting.</p>
<p>Look at the graph of the real network of the Al-Qaeda terrorists involved in the September 11 attack. Can you figure out who the most “connected” terrorist is?</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=581&fit=crop&dpr=1 600w, https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=581&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=581&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=731&fit=crop&dpr=1 754w, https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=731&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/369474/original/file-20201116-19-1nt0r2j.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=731&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">This graph represents the hijacker network responsible for the 2001 attack on the World Trade Centre.</span>
<span class="attribution"><span class="source">Valdis Krebs</span></span>
</figcaption>
</figure>
<h2>The murky networks of crime syndicates</h2>
<p>This information is often expressed in mathematical form, too. These numbers offer information about the dynamics of a group and the specific role of each individual within. </p>
<p>Social network analysis is particularly effective in helping investigators understand covert criminal networks, whether this is a <a href="https://www.tandfonline.com/doi/pdf/10.1080/01639625.2017.1421128?casa_token=iN3IoTzBALQAAAAA%3ALGj43P6qM_p5aidR4Lxn-bItgvxld5Z1fBKLxZON0C03caJcIN8MQy7tP-wYTNnJjTazheuip3_YGCc&">biker gang</a>, <a href="https://link.springer.com/article/10.1007/s12117-020-09397-5">group of cyber criminals</a> or <a href="https://iris.unime.it/retrieve/handle/11570/3110930/166254/Thesis%20Roberto%20Musotto%20final%20version.pdf">members of the Sicilian Mafia</a>. It can reveal details such as:</p>
<ul>
<li>who the key individuals are in the group</li>
<li>how the various members are connected with one another</li>
<li>how the members combine, or act alone, to carry out crime.</li>
</ul>
<p>A judge in a preliminary hearing could consult a graph like the one above to help decide whether there is a case to be made against each member. </p>
<p>Mathematical metrics could further filter out individuals for which there is enough evidence to prosecute. This would also help judges reach fairer decisions on jail terms or acquittals.</p>
<h2>Surfing through oceans of data</h2>
<p>Due to time, money and human resource restrictions, often not all evidence from investigations is used in criminal court proceedings.</p>
<p>Social network analysis would greatly benefit prosecutors in criminal trials involving an excess of digital evidence, which <a href="https://www.mordorintelligence.com/industry-reports/global-digital-forensics-market-industry">continues to grow</a> alongside general online data.</p>
<p>In Australia, any electronic device seized by authorities must be evaluated in court. Western Australia’s police force processes over <a href="https://securitybrief.com.au/story/wa-police-force-uses-tech-to-build-a-case-against-criminals">2.8 terabytes of data</a> (2,800 GB) for every case it investigates.</p>
<p>In the <a href="https://blogs.unimelb.edu.au/opinionsonhigh/2014/04/10/bell-group-case-page/">2008 trial</a> of Bell Group v Westpac Banking Corporation, digital evidence extended the final judgement enormously to about 2,500 pages.</p>
<p>Similarly, a 2016 civil case in Victoria, <a href="https://victorianreports.com.au/judgment/view/51-VR-421">McConnell Dowell Constructors v Santam and Others</a>, required counsel to go through 1.4 million documents in electronic format. This would have taken about 583 weeks. </p>
<p>The Supreme Court allowed (for the first time) a <a href="https://www.lawinorder.com.au/our-services/ediscovery-services/managed-document-review?utm_source=google&utm_medium=cpc&utm_campaign=MDR&utm_term=ediscovery&creative=329206319214&keyword=ediscovery&matchtype=b&network=g&device=c&creative=329206319214&keyword=ediscovery&matchtype=b&network=g&device=c&gclid=Cj0KCQiAwMP9BRCzARIsAPWTJ_HbBpaItGupGfiN97oCTDDwDv-TctATaCSLRH69GUSvv1bZfq7NyFAaAtm6EALw_wcB">technology-assisted review</a> to isolate the most “relevant” documents. </p>
<p>But this didn’t help the court understand how the various documents were linked, which would only be <a href="https://dl.acm.org/doi/10.1145/345966.345982">possible through social network analysis</a>. </p>
<h2>Removing potential for bias</h2>
<p>Moreover, large criminal investigations are often broken into multiple trials. While this is economical and maximises resources, it’s inherently risky because evidence can be evaluated differently depending on the court. </p>
<p>This is why the <a href="https://www.pressreader.com/uk/the-new-european/20200521/282132113640455">largest and most expensive Mafia trial</a> in history, the 1986 Maxiprocesso trial, was heard by only one court and jury. The initial trial <a href="https://www.bmiaa.com/extraordinary-visions-litalia-ci-guarda-exhibition-at-maxxi-rome/14_bonaventuraimbriaco_fascicolimaxiprocesso1986-198corleonepalermo2012/">involved</a> 349 hearings over almost two years. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="Photo from the famous Maxi trial." src="https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=412&fit=crop&dpr=1 600w, https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=412&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=412&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=517&fit=crop&dpr=1 754w, https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=517&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/370748/original/file-20201123-19-73nu4m.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=517&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">The Maxi, or Maxiprocesso, trial was conducted against the Sicilian mafia in Palermo, Sicily. The trial started in February, 1986 and ended in January, 1992.</span>
<span class="attribution"><span class="source">Wikimedia Commons</span></span>
</figcaption>
</figure>
<p>In hindsight, discussions surrounding evidence in the trial could have been shortened had social network analysis been available at the time.</p>
<p>In any criminal investigation, there’s also potential for bias from investigating officers. This bias can introduce errors into the evidence pool, which may not be picked up during a trial, and subsequently distort any analysis conducted. </p>
<h2>Technology: both a problem and a solution</h2>
<p>Of course, social network analysis isn’t perfect. While it can tell us how an individual interacts with a syndicate, it can’t guide us as to whether that person should be <a href="https://firstmonday.org/ojs/index.php/fm/article/view/941">considered separate to the main network</a> or not. This remains the judge’s decision.</p>
<p>There are also limitations to how online networks can be investigated. Often, important data is stored outside police jurisdiction, or requires a search warrant from law enforcement before it can be accessed (<a href="https://onezero.medium.com/cops-are-increasingly-requesting-data-from-facebook-and-you-probably-wont-get-notified-if-they-5b7a2297df17#">such as with Facebook</a>).</p>
<p>Other times, data that’s crucial for an investigation may be hosted on an encrypted service <a href="https://www.whatsapp.com/security/">such as WhatsApp</a>, or may be hard to trace if it was uploaded anonymously or under a fake persona.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/facebooks-push-for-end-to-end-encryption-is-good-news-for-user-privacy-as-well-as-terrorists-and-paedophiles-128782">Facebook's push for end-to-end encryption is good news for user privacy, as well as terrorists and paedophiles</a>
</strong>
</em>
</p>
<hr>
<p>Still, social network analyses could prove to be an invaluable support tool to help judges and jurors assess the value of evidence. </p>
<p>If both have a detailed and holistic understanding of the case, this will help ensure the right people are convicted — as quickly as possible and with the sentencing deserved.</p><img src="https://counter.theconversation.com/content/150087/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Roberto Musotto is affiliated with the Cyber Security Research Cooperative Centre (CSCRC), whose activities are partially funded by the Australian Government’s Cooperative Research Centres Programme.</span></em></p>With mountains of digital evidence, advanced computing techniques could help judges and jurors better understand how criminal syndicates operate — potentially allowing fairer sentencing.Roberto Musotto, Cyber Security Cooperative Research Centre Postdoctoral Fellow, Edith Cowan UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1443722020-08-12T14:01:21Z2020-08-12T14:01:21ZPasha 76: Taking a look at an intensive care unit during the COVID-19 pandemic<figure><img src="https://images.theconversation.com/files/352495/original/file-20200812-20-rvu9w5.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">shutterstock</span> </figcaption></figure><p>Nowadays, when one thinks of an intensive care unit or ICU, one might think of a ventilator. But the ICU is so much more than that. A lot of work that goes into keeping patients stable in the ICU so they can recover better. And in this challenging time of the coronavirus, ICUs have come under strain. As part of a global study, Groote Schuur hospital in Cape Town has joined a global alliance sharing clinical insights and using cutting edge technology to find more effective treatments for the most critically ill COVID-19 patients.</p>
<p>In today’s episode of Pasha, David Thomson, a critical care specialist and lecturer at the University of Cape Town, tells us more about why studies like this are important.</p>
<hr>
<p><strong>Photo:</strong>
“ICU Abbreviation or acronym of intensive care unit in hospital or clinic, special medical unit” By Shidlovski <a href="https://www.shutterstock.com/image-photo/icu-abbreviation-acronym-intensive-care-unit-1030795105">Shutterstock</a></p>
<p><strong>Music</strong>
“Happy African Village” by John Bartmann, found on <a href="http://freemusicarchive.org/music/John_Bartmann/Public_Domain_Soundtrack_Music_Album_One/happy-african-village">FreeMusicArchive.org</a> licensed under <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 1</a>.</p>
<p>“African Moon” by John Bartmann, found on <a href="http://freemusicarchive.org/music/John_Bartmann/Public_Domain_Soundtrack_Music_Album_One/happy-african-village">FreeMusicArchive.org</a> licensed under <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 1</a>.</p><img src="https://counter.theconversation.com/content/144372/count.gif" alt="The Conversation" width="1" height="1" />
Studies like this are important because they help gather data from an African perspective.Ozayr Patel, Digital EditorLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1348942020-03-27T08:10:05Z2020-03-27T08:10:05ZTracking your location and targeted texts: how sharing your data could help in New Zealand’s level 4 lockdown<p>New Zealand and much of the world is now under an unprecedented lockdown. <a href="https://theconversation.com/overjoyed-a-leading-health-expert-on-new-zealands-coronavirus-shutdown-and-the-challenging-weeks-ahead-134395">Public health experts say</a> this is the best way to suppress the spread of the virus. But how long will such a lockdown be socially sustainable? </p>
<p>As someone who’s worked in the mobile device software industry and now lectures on business analytics at the University of Auckland, I’d argue technology could play a bigger role in ensuring more New Zealanders <a href="https://covid19.govt.nz/government-actions/covid-19-alert-level/">stay home</a> to save lives. </p>
<p>Data analytics, based on our <a href="https://theconversation.com/privacy-vs-pandemic-government-tracking-of-mobile-phones-could-be-a-potent-weapon-against-covid-19-134895">mobile phone usage</a>, would allow us to provide a mixture of incentives and gentle <a href="https://theconversation.com/why-richard-thaler-won-the-2017-economics-nobel-prize-85404">nudges</a> to do the right thing, while also supplying crucial information for health researchers. </p>
<p>But using mobile phone data can be a threat to personal privacy: critics rightly warn that once tracking systems are put in place, those in power have little incentive to remove them. While we need to act quickly to stop the virus spread, we also need to <a href="https://www.nytimes.com/2020/03/23/technology/coronavirus-surveillance-tracking-privacy.html">respect personal privacy</a>. </p>
<p>So what more could New Zealand be doing to use our phones and our love of the internet to fight COVID-19?</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/as-nz-goes-into-lockdown-authorities-have-new-powers-to-make-sure-people-obey-the-rules-134377">As NZ goes into lockdown, authorities have new powers to make sure people obey the rules</a>
</strong>
</em>
</p>
<hr>
<h2>Using big data for the greater good</h2>
<p>Different nations have chosen different models to fight coronavirus – and some of those approaches clash with our values in New Zealand. </p>
<p>While some point to the success of China’s lockdown of Wuhan as a model of how to stamp out transmission, the scenes of people literally <a href="https://www.washingtonpost.com/opinions/2020/02/06/warning-chinese-authoritarianism-is-hazardous-your-health/">welded inside</a> their apartment buildings shouldn’t be forgotten. Clearly, that is not what we want our society to look like.</p>
<p>But the social problem we face in New Zealand now is a classical liberal dilemma: pitting individual rights to free movement and privacy against those of the community. Right now, given the scale and severity of COVID-19, it is currently the right choice to prioritise community health and safety over individual rights. </p>
<p>That means some of our normal concerns about digital privacy may have to be temporarily overridden in favour of a greater good. However, we must remain true to our liberal traditions and continue to try to balance individual and community rights. </p>
<h2>What New Zealand can learn from overseas</h2>
<p>Europe has strong privacy laws but has also <a href="https://apnews.com/711ec49215d39d1c420622ade1a18f93">endorsed the use of personal data</a> in a limited set of circumstances to fight the spread of the virus.</p>
<p>While the United States and Europe struggle with containment, <a href="https://theconversation.com/why-singapores-coronavirus-response-worked-and-what-we-can-all-learn-134024">Singapore</a> seems to have escaped some of the worst effects of the virus. Tracking information voluntarily provided by <a href="https://www.gov.sg/article/help-speed-up-contact-tracing-with-tracetogether">a contact tracing app on mobile phones</a> has made it possible to find people who have been in contact with infected people. </p>
<p>Other nations are beginning to <a href="https://www.top10vpn.com/news/surveillance/covid-19-digital-rights-tracker/">implement similar solutions</a> but valid concerns about privacy remain.</p>
<p>Tracking applications on phones or using the data mobile network operators collect could allow authorities to trace the prior movements of people found to be infected, and test those they came into contact with. Israel has implemented a system designed to <a href="https://www.haaretz.com/israel-news/israel-unveils-app-that-uses-tracking-to-tell-users-if-they-were-near-virus-cases-1.8702055">protect user privacy</a>.</p>
<p>Crucially, both <a href="https://www.cnbc.com/2020/03/25/coronavirus-singapore-to-make-contact-tracing-tech-open-source.html">Singapore</a> and <a href="https://medium.com/proferosec-osm/hamagen-application-fighiting-the-corona-virus-4ecf55eb4f7c">Israel</a> have committed to making their software freely available through copyright-free, open-source licences. This means software developers wouldn’t have to start from scratch in implementing similar solutions here in New Zealand.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/why-singapores-coronavirus-response-worked-and-what-we-can-all-learn-134024">Why Singapore's coronavirus response worked – and what we can all learn</a>
</strong>
</em>
</p>
<hr>
<h2>Safeguards and time limits on digital surveillance</h2>
<p>We can and should take advantage of this opportunity. Until recently, the adoption of such <a href="https://www.businessinsider.com.au/countries-tracking-citizens-phones-coronavirus-2020-3?r=US&IR=T">tools for surveillance would be unprecedented</a> and concerning for many, myself included. Before the crisis, tech companies’ use of big data to monitor and track people’s everyday habits was increasingly coming under scrutiny by legislators across the globe. </p>
<p>To gain acceptance, the public needs to have confidence that more intrusive data collection is necessary for public health, that it will not have negative effects for them or enrich others at their expense, and that it will be shut down after the crisis. </p>
<p>Any system implemented in New Zealand needs to have a clear end date, with public reporting and independent oversight. For instance, that public reporting could be done via <a href="https://www.rnz.co.nz/news/political/412520/special-committee-set-up-as-parliament-is-adjourned">the new cross-party committee</a> led by opposition leader Simon Bridges, which is scrutinising the government’s response to COVID-19. Once the crisis is over, the program needs to be shut down.</p>
<p>What kind of tracking and targeted public health prompts might be possible in New Zealand? </p>
<p>Mobile phone companies can use standard GPS and triangulation between phone towers to track your location when you’re out. One possible idea would be for mobile phone network providers to use their real-time data to text message people who appear to be a long way from home – in breach of the <a href="https://covid19.govt.nz/government-actions/covid-19-alert-level/">level 4 lockdown rules</a>, unless you’re working for an <a href="https://covid19.govt.nz/government-actions/covid-19-alert-level/essential-businesses/">essential business</a>.</p>
<p>These automated messages would be sent by an algorithm if certain criteria were met, and could remind people of lockdown rules and let them know their choices have consequences for others. </p>
<p>It appears that New Zealand is already exploring how it can use software in these kinds of ways. As <a href="https://www.stuff.co.nz/national/health/coronavirus/120518745/could-nz-use-mobile-phones-to-trace-the-contacts-of-covid19-cases">Stuff has reported</a>, the director-general of health has been holding early talks with the private sector – including software developers and mobile network operators – about using technology in the fight against COVID-19.</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/privacy-vs-pandemic-government-tracking-of-mobile-phones-could-be-a-potent-weapon-against-covid-19-134895">Privacy vs pandemic: government tracking of mobile phones could be a potent weapon against COVID-19</a>
</strong>
</em>
</p>
<hr>
<h2>Free data, discounted internet: ideas to keep people home</h2>
<p>Incentives could also encourage New Zealanders to follow social distancing rules. </p>
<p>Modern analytics allow us to target incentives at specific individuals or groups deemed to be at higher risk of flouting the level 4 rules. One idea worth considering would be paying internet and mobile service providers to offer discounts or other incentives for people staying home: such as free mobile data at home for those who don’t have wifi, subsidised internet for those working or studying from home, or game subscriptions or access to online classes.</p>
<p>Such incentives would likely be paid for out of the public purse. But targeted analytics could minimise costs while maximising the health benefits for us all – potentially ending New Zealand’s lockdown sooner.</p>
<p>These types of policies could also have positive economic effects. For instance, at a time when some of those households might have difficulty paying internet or phone bills, such incentives could enable some lower-income people to stay employed by having more opportunities to work from home, or provide children without current internet access at home with the ability to keep learning while schools are closed.</p>
<p>These are just a few ideas that could be effective. The difference between ideas such as these and those employed by surveillance states is that they use analytics to nudge people to make better choices, rather than relying solely on policing people in a heavy-handed manner.</p><img src="https://counter.theconversation.com/content/134894/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Jon MacKay does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Automated text messages if your phone detects you’re a long way from home, or discounted home internet, are just a few possible technology solutions to make New Zealanders “stay home to save lives”.Jon MacKay, Lecturer, Business Analytics, University of Auckland, Waipapa Taumata RauLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1252272019-10-29T19:24:00Z2019-10-29T19:24:00ZSydney lockout laws review highlights vital role of transparent data analysis<p>The New South Wales Bureau of Crime Statistics and Research (BOCSAR) <a href="https://www.bocsar.nsw.gov.au/Pages/bocsar_media_releases/2019/mr-Impact-lockouts-on-the-CBD.aspx">recently claimed</a> Sydney’s alcohol licensing regulations, commonly known as <a href="https://theconversation.com/au/topics/lockout-laws-26282">lockout laws</a>, reduced non-domestic assaults by 13% in the CBD. Its calculation relied on a decision to allocate 1,837 of these offences to both Kings Cross and the CBD – that is, double-counting the data. <a href="https://www.parliament.nsw.gov.au/ladocs/submissions/63631/Submission%20734%20-%20Centre%20for%20Translational%20Data%20Science,%20University%20of%20Sydney.pdf">Our analysis</a> found this decision was critical to the conclusion that assaults decreased in the CBD. For every other choice about the areas to which offences data were allocated and type of analysis we found no decrease. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=471&fit=crop&dpr=1 600w, https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=471&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=471&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=592&fit=crop&dpr=1 754w, https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=592&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/298452/original/file-20191024-119433-7qnb4k.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=592&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Map of Sydney and the entertainment precincts as used by BOCSAR in its analysis: blue – CBD entertainment precinct; red – Kings Cross entertainment precinct; green – nearby displacement areas; yellow – outer displacement areas.</span>
</figcaption>
</figure>
<p><a href="https://www.parliament.nsw.gov.au/ladocs/submissions/63631/Submission%20734%20-%20Centre%20for%20Translational%20Data%20Science,%20University%20of%20Sydney.pdf%5D">Our findings</a> highlight an important question: how do the choices of data collection, pre-processing and analysis affect policy decisions?</p>
<p>The allocation of crimes to areas is just one of several choices made when using data to assess policy impacts. Other choices include how to measure violent crime, what time period to consider and the geographical extent of the areas to include. The question is: if other choices were made, would the results affect a <a href="https://www.abc.net.au/news/2019-09-08/sydney-lockout-laws-rolled-back/11489806">decision to repeal or continue the laws</a>? </p>
<p>Our findings point to the need to follow a couple of principles when using data to inform policymaking. First, the institution that collects data and the institution that analyses the data should be independent of each other. Second,
we need as much transparency about the data and its analysis as possible.</p>
<h2>So what exactly did the analyses show?</h2>
<p>BOCSAR chose to use monthly non-domestic assaults from 2009 onwards. There is nothing wrong with these choices, but others could have been made.</p>
<p>For instance, why from 2009 onwards, not from 2005? Why monthly, not daily? Why reported non-domestic assaults, not reported assaults causing grievous bodily harm? Why divide the area into the CBD and Kings Cross only? </p>
<p>One way of assessing the impact of such choices is to use different subsets of data, different types of data pre-processing and different statistical and/or machine-learning techniques. If the conclusion still remains the same, then our decision is robust to this source of variability. If not, we need to understand why.</p>
<p>For the Kings Cross precinct, the analysis by the Centre for Translational Data Science at the University of Sydney showed the conclusion remained unchanged irrespective of the frequency and period over which data were collected and the analysis performed. Non-domestic assaults had declined following the introduction of the lockout laws in 2014.</p>
<p>For the CBD the reverse was true. Only if we make exactly the same choices as BOCSAR, in particular allocating 1,837 crimes to both the CBD and King Cross, could we conclude non-domestic assaults had decreased very slightly. </p>
<p>Under all other variations of the analyses, including data, methodology and spatial allocation of that data, we found no decrease. Non-domestic assaults in the CBD had been decreasing since 2008 and, if anything, more slowly after the lockout laws took effect. </p>
<p>So why was the inclusion of 1,837 crimes so critical to the conclusions about the CBD? </p>
<p>Using data provided by BOCSAR, we plotted the most likely location of those 1,837 crimes. Figure 1 shows these crimes occurred mainly in Kings Cross, an area in which the crime rate had fallen since 2014. We say “most likely location” because we have yet to receive the additional data we requested from BOCSAR to help us locate exactly where these crimes occurred.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=588&fit=crop&dpr=1 600w, https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=588&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=588&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=739&fit=crop&dpr=1 754w, https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=739&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/298427/original/file-20191023-119429-qgf5xh.png?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=739&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Counts of crimes (per SA1 region) that were assigned to both the CBD and Kings Cross.</span>
<span class="attribution"><span class="source">Centre for Translational Data Science</span>, <span class="license">Author provided</span></span>
</figcaption>
</figure>
<p>With the removal of those 1,837 crimes from the CBD, we detected no decrease in non-domestic assaults. But BOCSAR apparently did. After removing those crimes from the CBD, BOCSAR released an <a href="https://www.bocsar.nsw.gov.au/Documents/BB/2019-Report-Effect-of-lockout-and-last-drinks-laws-on-assaults-BB142.pdf">updated report</a> to a <a href="https://www.parliament.nsw.gov.au/committees/listofcommittees/Pages/committee-details.aspx?pk=260">parliamentary inquiry into Sydney’s night-time economy</a>. This report claimed assaults in the CBD decreased by 4% (much less than the original 13%). </p>
<p>The committee then asked for our <a href="https://www.parliament.nsw.gov.au/ladocs/other/12591/Centre%20for%20Translational%20Data%20Science.pdf">comments</a>. We found the report did not provide a confidence interval for this decrease. Yet the report made a virtue of reporting uncertainty estimates for other quantities and elsewhere it claimed “statistically significant” results. </p>
<p>We replicated BOCSAR’s analysis and found the change in crime could have been as low as a 12% decrease and as high as a 6% increase. In other words, the result is “statistically insignificant”. </p>
<h2>What are the implications for making policy?</h2>
<p>Why does this matter? There are two reasons. </p>
<p>First, the danger in not explaining, quantifying and reporting uncertainty is that the public loses trust in data-driven policymaking. Only if conclusions acknowledge and explain the uncertainty inherent in inferring complex quantities from data can we make robust and explainable policy decisions that build trust with the public. </p>
<p>Second, if we don’t accept and report uncertainty we could stop looking for other explanations. We might then fail to achieve an outcome that everyone wants: a reduction in violence and a healthy night-time economy.</p>
<p>How do we proceed from here? We’d make two recommendations: </p>
<ol>
<li><p>The institution that collects and curates the data should be distinct, informed but independent from the institution/s that analyse the data. </p></li>
<li><p>There should be as much data transparency as possible, which would enable different groups to perform different types of analyses, using different sources of data. </p></li>
</ol>
<p>We are almost certain these different groups would produce different findings, but the subsequent discussion could provide insights that move us closer to more robust and acceptable policy decisions. </p>
<p>To <a href="https://www.brainpickings.org/2012/08/27/richard-feynman-on-the-role-of-scientific-culture-in-modern-society/">quote</a> Nobel Prize-winning physicist Richard Feynman:</p>
<blockquote>
<p>If we will only allow that, as we progress, we remain unsure, we will leave opportunities for alternatives … to make progress, one must leave the door to the unknown ajar.</p>
</blockquote>
<p>The <a href="https://www.parliament.nsw.gov.au/ladocs/inquiries/2519/Report%20-%20Sydneys%20night%20time%20economy.pdf">parliamentary committee’s recommendation</a> that BOCSAR and the Centre for Translational Data Science work together more closely appears to do just that. We look forward to an ongoing collaboration to further our understanding of the drivers of violent crime.</p><img src="https://counter.theconversation.com/content/125227/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>The authors do not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and have disclosed no relevant affiliations beyond their academic appointment.</span></em></p>The collection and analysis of data used for making policy should be independent and open to ensure public trust in decision-making. The debate over alcohol licensing shows why this matters.Sally Cripps, Professor of Statistics, Director of Centre for Translational Data Science, University of SydneyRoman Marchant, Senior research fellow and lecturer, University of SydneyLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1168322019-06-03T20:07:35Z2019-06-03T20:07:35ZHow big data can help residents find transport, jobs and homes that work for them<figure><img src="https://images.theconversation.com/files/276653/original/file-20190527-193549-1aaaaov.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Analysing big data can tell us how a big city ticks, including where suitable housing and jobs are, and how best to get to them.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/shanghai-city-scenery-big-data-521213980?src=bHzLNM2IJA15qy75yfgghw-1-7&studio=1">LIPING/Shutterstock</a></span></figcaption></figure><p>Thanks to the media, more people now know that you have to protect your personal data from being misused for commercial gains. Many of you are probably more conscious of what to share on Facebook or Instagram than you were two to three years ago. But, when used appropriately, data can be a great resource that informs urban management and planning. </p>
<p>For example, the <a href="https://railsmart.patrec.org/">RailSmart Platform</a>, a <a href="https://smart-cities.com.au/awards/2019-winners/best-integration-of-an-individual-technology/">Smart Cities 2019 award winner</a> last Thursday, integrates numerous sets of data from the Australian Bureau of Statistics and other data sets such as the public transport ticketing system to work out how the city of Perth functions and how people move around. </p>
<p>Typically, people want to know what areas they can afford that best suit their work and travel requirements. You can use this platform to find out about house prices by location, travel times, locations of strategic jobs and how to get to them.</p>
<p>When you look up the locations of businesses you can see which train stations or major bus stops provide easy access to jobs. If you know the types of jobs, then you will also know whether those jobs are <a href="https://www.tandfonline.com/doi/full/10.1080/08111146.2017.1294539">strategic jobs</a> – jobs that create and attract other jobs. If you also look at real estate data, you can then find out the property values and rental prices of nearby properties.</p>
<p>To create this platform, we analysed the big data for Perth and visually represented what we found. To make this information accessible, we created a user-friendly digital mapping interface to display the modelled data. </p>
<p>So what sort of data are we talking about?</p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/explainer-what-is-big-data-13780">Explainer: what is big data?</a>
</strong>
</em>
</p>
<hr>
<h2>Property values</h2>
<p>House prices are one of the key economic indicators that people often pay attention to. Average house prices in Australian capital cities are easy to find, but what about more location-specific prices? You may be renting at the moment and thinking of moving elsewhere, or you may be a prospective property buyer. </p>
<p>Using real estate data, we have mapped the values of properties in different locations. For example, we can show you the number of different types of properties sold (e.g. house, unit, land and other types) and the average sale price of those properties. We can also show the rental values of different locations. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/we-need-to-talk-about-the-data-we-give-freely-of-ourselves-online-and-why-its-useful-93734">We need to talk about the data we give freely of ourselves online and why it's useful</a>
</strong>
</em>
</p>
<hr>
<h2>Access to where people live</h2>
<p>Ever wondered how good (or bad) your local road network is? How about your local public transport? The app can help you with this too.</p>
<p>Using the road network, the public transport network and the timetable data, we have mapped how accessible train stations and major bus stops are to houses, units and apartments. Based on prior <a href="https://www.atrf.info/papers/2017/files/ATRF2017_Abridged_Paper_91.pdf">research</a>, the tool maps and models real-time analysis of accessibility to people, houses or jobs.</p>
<p>For example, we can show the locations you can get to from a specific train station using your own car or public transport on a map. Our data show that 66% of all dwellings in Perth can be accessed within 60 minutes using a private vehicle from Perth station. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=390&fit=crop&dpr=1 600w, https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=390&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=390&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=491&fit=crop&dpr=1 754w, https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=491&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/274044/original/file-20190513-183083-pgw7ur.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=491&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Accessibility of Perth station to dwellings – dark red areas show dwellings that can be reached within 30 minutes.</span>
<span class="attribution"><span class="source">RailSmart</span></span>
</figcaption>
</figure>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/heres-what-smart-cities-do-to-stay-ahead-72193">Here's what smart cities do to stay ahead</a>
</strong>
</em>
</p>
<hr>
<h2>Locations of ‘strategic’ jobs</h2>
<p><a href="https://www.tandfonline.com/doi/full/10.1080/08111146.2017.1294539">Strategic jobs</a> include jobs in IT and in academia. For planning purposes, you want to have more strategic jobs that will attract and create more employment.</p>
<p>We can show where strategic jobs are located on a map. In other words, we can show you the locations where you can expect to see more jobs concentrated and created. </p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=387&fit=crop&dpr=1 600w, https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=387&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=387&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=486&fit=crop&dpr=1 754w, https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=486&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/274046/original/file-20190513-183103-1raeobd.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=486&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Strategic job locations – dark green areas show where more strategic jobs are located.</span>
<span class="attribution"><span class="source">RailSmart</span></span>
</figcaption>
</figure>
<p>Looking at two maps, the access to jobs and the strategic job locations, we can see that only limited strategic jobs can be accessed from Joondalup station. </p>
<h2>The power of data</h2>
<p>Some of you may be wondering how and where we got all these data. Is the dystopian world created by George Orwell in his fictional work Nineteen Eighty-Four coming true, with “Big Brother” watching your every move? </p>
<p>Fear not. The RailSmart analysis does not use any personalised data and all the data sets we used can be freely accessed by anyone. </p>
<p>The platform relies on aggregated data. This means it uses groupings of data or user types – for example, students, or geographic areas such as suburbs. It is impossible to tell what an individual is doing or even the sale price of individual properties; the platform represents trends as patterns of users and areas.</p>
<p>What makes the platform so powerful is when a set of data that seems unimportant is analysed along with another set of data and all of a sudden the two sets of data actually indicate something of significance.</p>
<p>Vist <a href="https://railsmart.patrec.org/login">RailSmart</a> and see the power of big data (login required, free to sign up).</p><img src="https://counter.theconversation.com/content/116832/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Dr Sae Chi works for Planning and Transport Research Centre (PATREC) at the University of Western Australia. This work was undertaken as part of the RailSmart Wanneroo Project, which has received grant funding from the Australian government under the Smart Cities and Suburbs Program.</span></em></p><p class="fine-print"><em><span>Dr Linda Robson works for Planning and Transport Research Centre (PATREC) at the University of Western Australia. This work was undertaken as part of the RailSmart Wanneroo Project, which has received grant funding from the Australian government under the Smart Cities and Suburbs Program.</span></em></p>We have learnt to be wary of big data, but it can also be your friend: one platform combines and analyses data about housing, jobs and transport to reveal very useful information about living in Perth.Sae Chi, Research Associate at Planning and Transport Research Centre (PATREC), The University of Western AustraliaLinda Robson, Research Fellow at Planning and Transport Research Centre, The University of Western AustraliaLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1050072018-11-06T11:42:02Z2018-11-06T11:42:02ZA game plan for technology companies to actually help save the world<figure><img src="https://images.theconversation.com/files/242837/original/file-20181029-76411-1xzi1i9.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Working together, people and technology companies can make a lot of progress.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/saving-world-10199140">Pedro Tavares/Shutterstock.com</a></span></figcaption></figure><p>Smartphones, computers and social media platforms have become indispensable parts of modern life, but the technology companies that make them and write their software are under siege. In any given week, <a href="https://www.recode.net/2018/9/28/17915864/facebook-data-breach-mark-zuckerberg-hack-personal-data">Facebook</a> or <a href="https://slate.com/technology/2018/10/google-is-losing-users-trust.html">Google</a> or <a href="https://gizmodo.com/new-documents-show-amazons-face-scanning-tech-for-cops-1830032358">Amazon</a> does something to erode public trust in them. Now could be a moment for the industry to make good on Bill Gates’s promise of technology to do good, by “<a href="https://www.wired.com/2013/11/bill-gates-wired-essay/">unlocking the innate compassion</a> we have for our fellow human beings” and improving the world – or Mark Zuckerberg’s dream of building a “<a href="https://www.facebook.com/notes/mark-zuckerberg/building-global-community/10154544292806634">new social infrastructure</a> to create the world we want for generations to come.”</p>
<p>Around the globe, countries and societies are <a href="https://unstats.un.org/sdgs/files/report/2018/TheSustainableDevelopmentGoalsReport2018.pdf">falling behind</a> on reducing social inequalities and meeting goals for economic development and environmental sustainability. The <a href="http://www.ipcc.ch/">Intergovernmental Panel on Climate Change</a> is issuing increasingly dire warnings about the effects climate change will have on human life on Earth – the beginnings of which are already unfolding. </p>
<p>I lead a major research initiative called <a href="https://sites.tufts.edu/digitalplanet/">The Digital Planet</a> at the Fletcher School at Tufts where we study how technology is changing lives and livelihoods around the world. Here is an outline of how technology giants or nimble startups could help make Gates’s and Zuckerberg’s promises a reality.</p>
<h2>Identify a big hairy problem</h2>
<p>There is a long list of global problems to combat, including hunger, drought, poverty, bad health, polluted water and poor sanitation. One that’s connected to all the others is the recent <a href="http://www.ipcc.ch/report/sr15/">bombshell news</a> that climate change is accelerating: Over the next 20 years, Earth’s atmosphere will reach average temperatures as much as 2.7 degrees Fahrenheit above preindustrial levels. Consequently, extreme weather and natural disasters, food shortages, inundated coastlines and the near-elimination of coral reefs will likely happen even sooner than previously anticipated. </p>
<p>The scope of climate change gives <a href="https://theconversation.com/big-tech-isnt-one-big-monopoly-its-5-companies-all-in-different-businesses-92791">companies like Google, Facebook and Amazon</a> excellent opportunities to find specific approaches that would have meaningful effects.</p>
<h2>Trace the root causes</h2>
<p>There are, of course, many elements driving climate change. Consider the agriculture sector, which <a href="https://www.nature.com/news/one-third-of-our-greenhouse-gas-emissions-come-from-agriculture-1.11708">produces one-third</a> of all greenhouse gas emissions. Farms emit the <a href="https://www.nature.com/news/one-third-of-our-greenhouse-gas-emissions-come-from-agriculture-1.11708">largest share</a> and could benefit from a range of technologies, such as data analytics and artificial intelligence. As a bonus, innovating in agriculture could help <a href="https://unstats.un.org/sdgs/files/report/2018/TheSustainableDevelopmentGoalsReport2018.pdf">feed more people</a>. </p>
<h2>Identify how technology can make a big difference</h2>
<p>Technological tools could help farmers collect and use data to <a href="https://www.wri.org/blog/2013/10/farmer-innovation-improving-africa%E2%80%99s-food-security-through-land-and-water-management">manage their crops more precisely</a> in ways that would reduce greenhouse gas emissions – such as using less fertilizer and plowing and planting fields more efficiently. Specifically, better data on soil and plant health could help farmers know where they need to increase or decrease irrigation or pesticide and fertilizer use. These practices save farmers money and increase farms’ productivity, generating more food with less waste. </p>
<h2>Recognize how you can make money from it</h2>
<p>If companies are to get involved, there needs to be an opportunity to earn money – and the more, the better. </p>
<p>One estimate suggests that making changes in farming and food practices that enhance productivity, promote sustainable methods and reduce waste could produce <a href="http://report.businesscommission.org/uploads/BetterBiz-BetterWorld_170215_012417.pdf">commercial opportunities and new savings worth US$2.3 trillion</a> overall worldwide annually.</p>
<p><a href="https://sites.tufts.edu/digitalplanet/">Our research team</a>, in work that is ongoing, has estimated that of that $2.3 trillion a year, $250 billion could come from the application of artificial intelligence and other analytics for precision farming alone – $195 billion of which would be in the developing world, with $45.6 billion in South Asia and $13.4 billion in East Africa. Other estimates for the effects of AI and analytics are less specific, but still within the same range – <a href="https://www.mckinsey.com/featured-insights/artificial-intelligence/visualizing-the-uses-and-potential-impact-of-ai-and-other-analytics">between $164 billion and $486 billion</a> annually. There is indeed money to be made by technology companies interested in developing climate-friendly, productivity-improving interventions in agriculture.</p>
<h2>Innovate to overcome the many barriers to change</h2>
<p>Before the commercial value can be unlocked, however, there are many barriers to consider. Many rural areas, even in the developed world, <a href="http://www.pewresearch.org/fact-tank/2017/05/19/digital-gap-between-rural-and-nonrural-america-persists/">don’t have affordable high-speed internet connections</a> and, particularly in the developing world, the farming community is not as technology savvy as other professions. Further, farming practices have been handed down through generations and the idea of using data to make modifications to such long-held beliefs and methods can be countercultural. </p>
<p>In addition, there are many practical realities: <a href="http://www.fao.org/docrep/005/Y3918E/y3918e10.htm">83 percent of the world’s cultivated land</a> is fed only by rain, with no irrigation systems to make use of better data. Beyond that, in most parts of the world, <a href="https://www.theguardian.com/sustainable-business/2015/feb/02/pioneer-firms-feed-world-agriculture-india-mozambique-profit">seeds and fertilizer are not high-quality</a>, lowering crop efficiency. Further, a lot of <a href="http://www.postharvest.org/home0.aspx">farms’ output is wasted</a> because of lack of refrigeration and slow transportation from fields to consumers.</p>
<p>With all those obstacles, it is understandable that investments in data-driven agriculture <a href="https://www.wsj.com/articles/why-big-data-hasnt-yet-made-a-dent-on-farms-1494813720">dropped 39 percent</a> from 2015 to 2016.</p>
<p>There are groups still working, though. <a href="https://www.microsoft.com/en-us/research/project/farmbeats-iot-agriculture/">FarmBeats</a> is a Microsoft project that combines low-cost sensors in the ground with drones that both create aerial maps and act as wireless data relay points. Nigeria’s <a href="http://zenvus.com/">Zenvus</a> and India’s <a href="http://www.aibono.com/">Aibono</a> analyze soil data. Kenya’s <a href="https://farmdrive.co.ke/">FarmDrive</a> develops credit scores for people without formal bank accounts or standard borrowing histories by using alternative data, like mobile phone and social media activity, together with local agricultural and economic information. Ghana’s <a href="https://farmerline.co/">Farmerline</a> tells farmers about weather forecasts, market information and financial tips. </p>
<p>These are creative efforts to solve deep and complex problems, but clearly there is room for large, well-resourced technology companies to step in, make a difference with big ideas, deep pockets and global support.</p>
<h2>Invest in partnerships</h2>
<p>Technology entrepreneurs will need to develop business models and organizational structures that are better at collaborating with local agricultural communities and businesses, to navigate personal and political relationships as well as regulations and government programs. Technology will not, on its own, be some sort of silver bullet that will unlock prosperity. </p>
<p>Changing technology companies into agents for widespread global good will not be easy – and it can be done in areas beyond agricultural innovation, too. </p>
<p>There has been no shortage of talk about these ideas: <a href="https://techcrunch.com/2018/05/23/50-tech-ceos-come-to-paris-to-talk-about-tech-for-good/">50 CEOs</a> met with French President Emmanuel Macron to discuss socially positive technologies; World Economic Forum events around the world discuss societal benefits of a <a href="https://www.weforum.org/about/the-fourth-industrial-revolution-by-klaus-schwab">Fourth Industrial Revolution</a>; and some companies, such as <a href="https://www.ericsson.com/en/about-us/sustainability-and-corporate-responsibility/sustainable-development-goals">Ericsson</a> and <a href="https://www.sap.com/dmc/exp/2018-01-unglobalgoals/">SAP</a>, are already committed to fulfilling <a href="https://sustainabledevelopment.un.org/?menu=1300">United Nations goals for global sustainability</a>. </p>
<p>We still have a long way to go. There is still a chance for technology companies to move fast and fix things by truly helping save the world – but sea levels are rising, so the time is now.</p><img src="https://counter.theconversation.com/content/105007/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Bhaskar Chakravorti has founded and directs the Institute for Business in the Global Context at Fletcher/Tufts that has received funding from Mastercard, Microsoft, the Gates Foundation and the Onassis Foundation. He is a Non-Resident Senior Fellow at Brookings India and a Senior Advisor on Digital Inclusion at the Mastercard Center for Inclusive Growth.
</span></em></p>Amazon, Facebook and Google have lofty goals for their effects on global society. But people around the world are still waiting for the positive results. Here’s what the tech giants could do.Bhaskar Chakravorti, Dean of Global Business, The Fletcher School, Tufts UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1046692018-10-10T16:05:57Z2018-10-10T16:05:57ZWhy teach ethnography to managers (in the big data era)?<figure><img src="https://images.theconversation.com/files/239932/original/file-20181009-72117-ztggvi.jpg?ixlib=rb-1.1.0&rect=0%2C36%2C2048%2C1324&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Is a cassette player an "ordinary object" or a "mystery"? It depends on whom you ask, and ethnography can help you ask the right questions.</span> <span class="attribution"><a class="source" href="https://www.flickr.com/photos/yoshikazut/31011124386/in/photolist-7tHAXc-4GpFy2-4ZHLSA-8VBPDQ-9MKk3M-8N7VV7-9MN5C1-7tHw8K-7twG8K-7tHAgr-8N9n5A-bmsrxM-NL6Me-Pfm7So-emtBDS-4GtRju-7vbtwz-5KpZBj-7tHx9e-39WN9c-7tHwG6-pqKGmL-ecZS3R-d15Quf-w9Dby">Yoshikazu Takada</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span></figcaption></figure><p>In management circles and beyond, companies are rushing to integrate, adapt and exploit big data in their organisations. Moreover, they are willing to recruit nearly anyone with a mention of “big data” or “artificial intelligence” in their resumes. There are nonstop consultant talks and crowded workshops on big data, and academic journals are rushing out special issues with the magical keywords. Nearly absent before 2011, big data is well on its way to be the most talked about topic in the management press, including the <em>Economist</em>, <em>Financial Times</em>, <em>Wall Street Journal</em> and <em>Forbes</em>.</p>
<p>Business schools, too, are rushing to restructure their offering around big data and analytics – it seems as if nothing more is needed. Yet little is said about the kind of understanding and <em>reflexivity</em> that is needed when working with such voluminous data. We believe that important lesson can be learned from ethnographic research, which should be taught to managers obsessed with big data.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=335&fit=crop&dpr=1 600w, https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=335&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=335&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=421&fit=crop&dpr=1 754w, https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=421&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/239933/original/file-20181009-72124-1vfdvvm.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=421&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption"></span>
<span class="attribution"><span class="license">Author provided</span></span>
</figcaption>
</figure>
<h2>Big-data obsession in management</h2>
<p>Companies are wasting no time in leveraging big-data solutions to predict behaviour, profile their customers, and to enhance their marketing effectiveness. For example, many use big data to target and develop recommendation algorithms. Amazon suggests relevant products you are likely to be interested in buying, Netflix lists movies you are likely to want to watch, Spotify and Pandora propose songs you might enjoy listening, and Zappos optimises its entire product selection accordingly. Through these and countless related developments, big data has already become a reality in our daily lives.</p>
<p>Big data offers great advantages, and detecting patterns in customer behaviour is beneficial for companies. It offers a powerful way to refine customer profiling and to develop subtle, automated targeting strategies. Notably, it allows to discover correlations that would otherwise remain <a href="https://www.wsj.com/articles/big-data-helps-companies-find-some-surprising-correlations-1395168255">unnoticed</a>. For example, <a href="https://www.digitalforallnow.com/walmart-big-data-retail/">Walmart</a> uses an ocean of data – the retailer analyses more than 2.5 petabytes per hour – to see if they can identify unknown correlations in consumption habits. And they have indeed found many: berries are likely to be consumed more when there is a low wind and temperature is below 27° C, whereas cloudy, windy and warm weather prompts steak purchases, and hot, dry weather and light winds trigger hamburger sales.</p>
<p>Such correlations are not new – they simply were difficult (or impossible) to detect before. This is why companies reach out to combine and explore even more types and dimensions of data they can. Health insurance companies for instance ask their clients to give access to their <a href="https://www.washingtonpost.com/business/2018/09/25/an-insurance-company-wants-you-hand-over-your-fitbit-data-so-they-can-make-more-money-should-you/?noredirect=on&utm_term=.6b283c04520b">Fitbit activity data</a>, with the promise of a better insurance deal. Many retailers have <a href="https://theconversation.com/weather-sensitive-products-adjusting-price-and-promotions-to-increase-sales-83595">in-store traffic monitoring systems</a> allowing them to track the location of customers within the store as they move in order to optimise their marketing effectiveness. In fact, big data is being proposed as a solution for just about everything. <a href="https://mashable.com/article/burger-king-ai-ads-beautiful-disaster/?europe=true">Burger King</a> recently used it for designing their advertising – however, with somewhat disastrous results.</p>
<p>We argue that the obsession with big data collection and analysis risks becoming an end in itself. This would significantly narrow down the types of understandings that are being produced, valorised and acted upon. Managers also need to foster a sensitivity to sociocultural, contextual knowledge which, unfortunately, is largely erased by big data storing mechanisms. Pure correlations with weather and purchase behaviour, for example, hide deeper level cultural processes at work – not least how the sunny weather and blue sky “call” for a barbecue precisely because of the inherent connection between lifestyles and consumption.</p>
<h2>Big data limitations</h2>
<p>Surprisingly few talk about the potential limitations. First, due to the thirst for ever more data, there seems to be no end to <em>how much is enough data</em>. At the same time, collecting, storing, updating, and curating big data is – of course – extremely costly. For the record, many have also claimed that much of this data is <a href="https://www.goodreads.com/book/show/29525490-business-bullshit">hardly useful at all</a>. But since they do not know which data could be interesting or not, managers have decided to keep on collecting it. In many cases, unfortunately, companies do not have resources to properly distil meaningful insight from it. These companies would thus arguably be better off not venturing in big data collection at all.</p>
<p>Second, big data relies on petabytes of what we call “decontextualised” data – in other words, data points extracted from the actual situation from which they were produced in the first place. The number of “clicks” or “views”, for example, are often closely measured and recorded but they do not inform managers about the immediate contexts, moods or situations in which users were clicking and viewing the website. Despite technological progress, a significant part of this context will always remain impossible to measure because of its inherent complexity. Yet it is a crucial factor for the understanding and explaining the studied behaviour at stake: doings and sayings become meaningful only in their immediate sociocultural context.</p>
<p>Third, we argue big data is unable to address embodied, sensory and affective experiences. When seeking to measure an emotion, for example, big data may only hope to measure the physiological reactions of the persons captured via sensors (muscles tension, sweat, heart rate, brain, etc.), but not the acute meaningful emotional states that people live through. When <a href="https://www.cbs.nl/en-gb/corporate/2017/16/using-twitter-data-to-measure-emotions">analysing tweets to determine people’s emotions</a>, data analysts agree they could not address emotions themselves but only traces of their narration. This is a crucial caveat as the sensory dimensions are essential toward fostering understanding of the experiences that people actually live through. This is problematic, as human lives are essentially about experiences.</p>
<p>Finally, it is safe to say that big data alone is not helpful for developing a “deep” understanding. What big data scientists can find out are correlations between variables (what is or happens), not causation (why and how it happens). Hence, big data is an interesting and useful tool but should not be the only focus of attention. This is why we turn next to examining ethnographic thinking and research as a potential antidote for big data obsession.</p>
<h2>Benefits of ethnography for managers</h2>
<p>While big data analytics are quickly entering the curriculum of most business schools, ethnographic methods often remain reserved to social sciences departments of universities. However, some institutions have decided to make them a much more visible and foundational part of management learning – from consumer behaviour, marketing research, branding, service experience and strategy – where ethnographic reflexivity and methods are actively in use.</p>
<p>First, ethnography is all about gathering in-depth data about lived experiences and situations. Anthropologist <a href="https://en.wikipedia.org/wiki/Clifford_Geertz">Clifford Geertz</a> famously described this kind of data as <a href="https://www.goodreads.com/book/show/330006.The_Interpretation_of_Cultures?from_search=true">“thick descriptions”</a>: long-term and deep reflections about the experiences that people live. Expert on Balinese culture and rituals, Geertz crafted his insights on first-hand participant observation with the idea that the ethnographer needs to live through the same experiences as the studied people. Thus, she/he is committed to discovering and sharing a common phenomenological sensibility and understanding – in a way, in his attempt to get inside the “skins of others”. The method has been a staple in anthropology and sociology for over a hundred years, but it is gaining acuity and relevance in understanding today’s fast-paced <a href="https://academic.oup.com/jcr/article/31/4/868/1812998">society and markets</a>. Producing in-depth data is useful not least for managers wishing to grasp, for example, customers’ or employees’ experiences – from their point of view.</p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=403&fit=crop&dpr=1 600w, https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=403&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=403&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=506&fit=crop&dpr=1 754w, https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=506&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/240039/original/file-20181010-72103-ofqvvw.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=506&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Clifford Geertz, an expert on Balinese culture and rituals, asserted that ethnographers need to live through the same experiences as the studied people to understand them. Above, a village ceremony in 2018.</span>
<span class="attribution"><a class="source" href="https://www.flickr.com/photos/artembali/45132822702/in/photolist-2bLevMy-UTZZMT-mdzuqx-MLJKKz-8Fqxow-abNPN7-aQ64Dt-bRPJ7c-9nT7Qb-aPobiR-6JzdJe-a2J1LH-nsaT4p-6JDmAW-fjzsB9-ccVCmA-aPnWNV-a2HCCv-aPnFTK-aPocxt-5A4okC-aPnYBD-6JzcKX-9nSmhq-2awVKoW-9nPkfP-9nPvy6-aPnE3B-coHEUu-coHdEw-aPofF6-a2LSWm-aPo9K2-6Jz53a-aPnTRr-aPo1Hc-a2LQ5o-aPnNSt-5GwWHS-6JzqXa-9LXf6p-aQ6coV-aPnQDi-aW8ffa-a2LV9q-a2LVC7-nK7aB7-a2J4uZ-a2LVJh-cNrWew">Artem Bali/Flickr</a>, <a class="license" href="http://creativecommons.org/licenses/by-nc/4.0/">CC BY-NC</a></span>
</figcaption>
</figure>
<p>Second, ethnography insists on reflexivity. This means that the ethnographer seeks to question her/his own preconceptions about the studied phenomena – a sort of unlearning about “what we think we know” is thus required. Also, it means that the ethnographer is mindful about the way she/he participates in shaping the studied realities: the kinds of questions being presented and the <a href="https://academic.oup.com/jcr/article-abstract/44/4/939/4104525?redirectedFrom=fulltext">power exerted over those studied</a>. In practice, this means being sensitive toward ensuring that people indeed share their unique views, experiences and narratives. Ethnographers are taught to mistrust what they consider “natural”, “normal” behaviour and “objective” evidence. For example, people who were born before the Millennial generation consider a cassette player to be a <a href="https://youtu.be/Uk_vV-JRZ6E">basic</a>, <a href="https://youtu.be/s9XTQkd_ulI">ordinary</a> object. But those who grew up interacting with smartphones and tablets such an object can be a mystery. Ethnography can thus help managers foster reflexivity about the “limits” of their own experience and being attentive to difference and multiplicity of understandings and truths.</p>
<p>Third, instead of gathering mountains of data on as many (decontextualised) variables as possible, ethnography seeks a profound understanding of the situational context. Within it, the objective is to uncover the social processes that help explain reasons why people are bound or likely to act the way they do. In 2013, Netflix worked with anthropologist Grant McCracken to understand the emerging <a href="https://media.netflix.com/en/press-releases/netflix-declares-binge-watching-is-the-new-normal-migration-1">online video-streaming phenomenon</a>. The company had no shortage of data about customers’ video viewing statistics but wanted to dig deeper into the social dynamic at stake. McCracken’s ethnographic work revealed the meaning and importance of “binge watching” for contemporary consumers. For him, our “digital lifestyle, where storytelling is often reduced to bite-sized, 140 character conversations or instagrammable images, leaves us craving the kind of long narrative of storytelling”. McCracken found that 73% of consumers feel good about “binging”, i.e., watching multiple series or movies in one viewing. This kind of analysis was indeed fruitful for Netflix toward better serving their customers.</p>
<p>Fourth, in radical contrast to big data approaches, ethnography is concerned with the building up of “embodied data and knowledge”. In other words, the building of analytical accounts produced by our very own bodies (by way of seeing, sensing, touching, hearing, tasting) – about life. Ethnography is particularly sensitive to the <a href="https://www.goodreads.com/book/show/6761484-doing-sensory-ethnography?from_search=true">multisensory aspects of people’s experiences</a>. Going to a live concert, a sport event or political demonstration cannot be reduced to the spectacle or “show” itself. Something happens during these events that can only be felt in our bodies, for example, an atmosphere of thrill emerges from social interactions, sights, sounds and other impressions which can sometimes touch, even change us. There is something in the aliveness and vivid flow of experiences that can only be addressed via our senses and which cannot not be captured by “dead” big data descriptions, cut out of their contexts and summarised in static charts or representations.</p>
<h2>Toward teaching a reflexive mind-set</h2>
<p>The above points emphasise a crucial fact: that for producing knowledge and insight about human behaviour, we may need more than big data. Ethnography calls for a curious and reflexive mind that is open to explore novel understandings and perspectives, challenging taken for granted assumptions and norms. It also insists on an economic principle: we need to gather new data until a “saturation point” is reached – when gathering any new data produces no further insight.</p>
<p>We argue teaching ethnographic thinking to managers is now more acute than ever. The world is changing with stunning speed, a flood of data is being produced by computing systems, and there is only little time to make decisions. Ethnographic mind-set enforces managers to:</p>
<ul>
<li><p>continuously reflect on the “right” questions and perspectives they may adopt,</p></li>
<li><p>exercise participant-observation which can be a “lifelong” asset,</p></li>
<li><p>critically analyse the kinds of seemingly “objective” empirical evidence offered to them (no matter how voluminous)</p></li>
<li><p>take a few healthy steps away from the ocean of data they may easily drown in.</p></li>
</ul><img src="https://counter.theconversation.com/content/104669/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Les auteurs ne travaillent pas, ne conseillent pas, ne possèdent pas de parts, ne reçoivent pas de fonds d'une organisation qui pourrait tirer profit de cet article, et n'ont déclaré aucune autre affiliation que leur organisme de recherche.</span></em></p>Big data is all the rage in management circles and beyond, yet little is said about the understanding needed with such voluminous data. An important lesson can be learned from ethnographic research.Joonas Rokka, Professeur associé en marketing, EM Lyon Business SchoolLionel Sitz, Professeur de marketing, EM Lyon Business SchoolLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/1013212018-08-13T06:30:45Z2018-08-13T06:30:45ZWhy the NAPLAN results delay is a storm in a teacup<figure><img src="https://images.theconversation.com/files/231599/original/file-20180813-2915-1lsu7i4.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">This year's preliminary release of NAPLAN data was due out August 8.</span> <span class="attribution"><span class="source">www.shutterstock.com</span></span></figcaption></figure><p>NAPLAN has caused much controversy this year, as has become customary. With the test in its tenth year, the New South Wales government <a href="https://www.theguardian.com/australia-news/2018/may/04/nsw-governments-call-to-scrap-naplan-rejected-by-simon-birmingham">called</a> for it to be scrapped and there were calls for a review after a report <a href="http://www.abc.net.au/news/2018-03-07/naplan-call-review-after-report-reveals-no-change-in-decade/9519840">found no change</a> in results in a decade. In June <a href="https://www.theaustralian.com.au/national-affairs/education/review-ordered-into-use-of-naplan-information/news-story/6142368234a8c9b41ed673270cfa1099">a review was finally ordered</a> into the use of NAPLAN information.</p>
<p>The most recent controversy – the <a href="http://www.abc.net.au/news/2018-08-08/naplan-results-delayed-over-concerns-results-invalid/10082734">delay in releasing NAPLAN results</a> – is in part about whether scores from paper and online tests can be statistically compared. </p>
<p>The data collected this year will be comparable, both between paper and online tests, and with tests from previous years because they will be compared on a <a href="https://arc.nesa.nsw.edu.au/go/k-6/common-grade-scale/">common scale</a>. </p>
<h2>What’s the issue?</h2>
<p>This years’ preliminary release of NAPLAN data, which was due out August 8, has been delayed. We don’t yet know when it will be released.</p>
<p>State education department heads questioned whether the paper and online tests were too different to be statistically comparable. The Victorian minister for education, James Merlino, criticised the <a href="https://www.acara.edu.au/">Australian Curriculum, Assessment and Reporting Authority</a> (ACARA) over its management of the online test.</p>
<h2>What is comparability?</h2>
<p>It’s important when considering comparability that we understand it has different meanings. In a measurement sense (the way it’s used with NAPLAN), it means we compare the achievement of the students on a “common mathematics scale”. This does not mean they are “the same”. That is, doing the test online is different from doing the test in a paper-and-pencil mode. But both forms provide evidence of what the students know and can do in numeracy and literacy.</p>
<p>This type of comparability happens regularly. For example, if you want to compare Australian dollars and Chinese yuan, you make them comparable by putting them onto a common scale (one AUD = five yuan). You can then compare them in terms of “an amount of money”, but they are not the same. </p>
<p>Similarly, when we construct an <a href="http://www.vtac.edu.au/results-offers/atar-explained.html">Australian Tertiary Admission Rank</a> (ATAR), we convert scores in different subjects to make them comparable on a common scale, add the scores up, then calculate an ATAR. This makes it possible to compare them in terms of the general ability that characterises the combined <a href="http://educationstandards.nsw.edu.au/wps/portal/nesa/11-12/hsc/about-HSC">Higher School Certificate</a> (HSC) score. The subjects are not the <em>same</em>, but they are <em>comparable</em> on a common scale. </p>
<p>In the same way, scores can be compared when the NAPLAN tests have been done in paper-and-pencil format and online. Comparing the results across years when we move from paper-and-pencil NAPLAN tests to NAPLAN online is much the same. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/five-things-we-wouldnt-know-without-naplan-94286">Five things we wouldn't know without NAPLAN</a>
</strong>
</em>
</p>
<hr>
<p>In this case ACARA has carried out significant <a href="https://www.acara.edu.au/assessment/online-assessment-research">research</a> to examine the impact of how the test is administered on the results. This has shown there is little – if any – major impact in terms of the purpose of NAPLAN. </p>
<h2>Storm in a teacup</h2>
<p>NAPLAN isn’t the only test ever to move from paper-and-pencil to online. The <a href="http://www.oecd.org/pisa/">Program for International Student Assessment</a> (PISA) and numerous other high-stakes international assessments have recently moved online. The OECD website explicitly <a href="http://www.oecd.org/pisa/pisafaq/">states</a>: </p>
<blockquote>
<p>Student performance is comparable between the computer-based and paper-based tests within PISA 2015 and also between PISA 2015 and previous paper-based cycles. </p>
</blockquote>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=503&fit=crop&dpr=1 754w, https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=503&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/231604/original/file-20180813-2897-1kd6qsc.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=503&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">NAPLAN was never intended to give fine-grained comparison of student achievement.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/download/confirm/608311472?src=-AtjHnq6lapFbDcA5rqt8g-1-18&size=huge_jpg">www.shutterstock.com</a></span>
</figcaption>
</figure>
<p>There is no doubt the controversy over NAPLAN comparability is a storm in a teacup. Students would have all attempted good-quality NAPLAN tests and done their best. The results will give them an indication of how they’re going on this occasion. </p>
<p>When the results do come out, my educated guess is teachers will find their students will have done pretty much as expected, based on all the other information teachers have about student achievement through their classroom-based assessments. NAPLAN provides one more bit of evidence, from a different perspective, that contributes to the overall image of the student. </p>
<hr>
<p>
<em>
<strong>
Read more:
<a href="https://theconversation.com/naplan-2017-results-have-largely-flat-lined-and-patterns-of-inequality-continue-88132">NAPLAN 2017: results have largely flat-lined, and patterns of inequality continue</a>
</strong>
</em>
</p>
<hr>
<h2>The real issue is misuse of data</h2>
<p>The real issue underpinning the controversy is the misuse of NAPLAN data. It was <a href="https://www.nap.edu.au/docs/default-source/default-document-library/naplan-assessment-framework.pdf?sfvrsn=2">never intended</a> that NAPLAN data would be used for fine-grained comparison of students. </p>
<p>The <a href="https://www.myschool.edu.au/">MySchool website</a> has contributed to the misuse of NAPLAN data. For example, the scores from the site are being used to make comparisons irrespective of the “error bands” that need to be taken into account when making comparisons. People are ascribing a level of precision to the results that was never intended when the tests were developed. The test was never designed to be high-stakes and the results should not be used as such. </p>
<p>When people challenge the “<a href="http://www.abc.net.au/radio/programs/am/intl-expert-questions-validity-of-this-years-naplan-data/10109310">validity</a>” of the NAPLAN test, they should be challenging the validity of the use of the results. NAPLAN has a high degree of validity, but we need to understand it better and use the results in a more judicious and defensible manner. The correct use of NAPLAN data is a major issue and it needs to be addressed as a matter of priority. </p>
<hr>
<p><em>This article has been updated since publication to clarify the author’s relevant affiliations.</em></p>
<hr><img src="https://counter.theconversation.com/content/101321/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Jim Tognolini was the Senior Vice President of Assessment and Reporting at Pearson until 2016 and owns JT Education Consulting.</span></em></p>Much of the controversy over the delay in this year’s NAPLAN data comes down to its misuse and a misunderstanding of statistical comparability.Jim Tognolini, Director, Educational Measurement and Assessment Hub, University of SydneyLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/952032018-06-12T10:38:16Z2018-06-12T10:38:16ZCan Facebook use AI to fight online abuse?<figure><img src="https://images.theconversation.com/files/222239/original/file-20180607-137298-1yrsks6.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">It can be complicated to teach a computer to detect harassment and threats.</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-illustration/robot-defends-desktop-computer-red-shield-499840540">Palto/Shutterstock.com</a></span></figcaption></figure><p>Facebook has <a href="https://newsroom.fb.com/news/2018/05/enforcement-numbers/">released statistics on abusive behavior</a> on its social media network, deleting more than 22 million posts for violating its rules against pornography and hate speech – and deleting or adding warnings about violence to <a href="https://www.bbc.com/news/technology-44122967">another 3.5 million posts</a>. Many of those were detected by automated systems monitoring users’ activity, in line with CEO Mark Zuckerberg’s statement to Congress that his company would <a href="https://www.washingtonpost.com/news/the-switch/wp/2018/04/10/transcript-of-mark-zuckerbergs-senate-hearing/">use artificial intelligence to identify social media posts</a> that might violate the company’s policies. As an <a href="https://scholar.google.com/citations?user=IrcFO1AAAAAJ&hl=en">academic researching AI and adversarial machine learning</a>, I can say he was right to acknowledge the significant challenges: “<a href="https://www.washingtonpost.com/news/the-switch/wp/2018/04/10/transcript-of-mark-zuckerbergs-senate-hearing/">Determining if something is hate speech is very linguistically nuanced</a>.”</p>
<p>The task of detecting abusive posts and comments on social media is not entirely technological. Even <a href="https://www.propublica.org/article/facebook-hate-speech-censorship-internal-documents-algorithms">Facebook’s human moderators have trouble</a> defining hate speech, <a href="https://www.theguardian.com/technology/2017/dec/05/facebook-bans-women-posting-men-are-scum-harassment-scandals-comedian-marcia-belsky-abuse">inconsistently applying the company’s guidelines</a> and even <a href="https://www.usatoday.com/story/tech/2017/08/03/facebook-ijeoma-oluo-hate-speech/537682001/">reversing</a> <a href="https://motherboard.vice.com/en_us/article/ywb5py/twitter-re-activated-an-account-it-told-congress-was-connected-to-a-russian-troll-farm">their</a> <a href="https://www.bbc.com/news/world-us-canada-41854482">decisions</a> (especially when they <a href="https://motherboard.vice.com/en_us/article/kzkpea/facebook-poop-emoji-hatespeech-documents">make headlines</a>). Also, abusers adapt to avoid detection – as email spammers sought to evade detection by replacing “Viagra” with “Vi@gra” in their messages. </p>
<p>Even more complication can come if attackers try to use the machine learning system against itself – <a href="https://www.theverge.com/2018/6/7/17437454/mit-ai-psychopathic-reddit-data-algorithmic-bias">tainting the data the algorithm learns from</a> to influence its results. For instance, there is a phenomenon called “<a href="https://www.wired.com/story/google-bombs-are-our-new-normal/">Google bombing</a>,” in which people create websites and construct sequences of web links in an effort to affect the results of Google’s search algorithms. A similar “<a href="http://pralab.diee.unica.it/en/node/729">data poisoning</a>” attack could limit Facebook’s efforts to identify hate speech.</p>
<h2>Tricking machine learning</h2>
<p><a href="https://news.codecademy.com/what-is-machine-learning/">Machine learning</a>, <a href="https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12">a form of artificial intelligence</a>, has proven very useful in detecting many kinds of fraud and abuse, <a href="https://www.ijcai.org/Proceedings/11/Papers/414.pdf">including</a> <a href="https://www.wired.com/2015/07/google-says-ai-catches-99-9-percent-gmail-spam/">email spam</a>, <a href="https://www.microsoft.com/itshowcase/Article/Content/956/Microsoft-thwarts-phishing-attempts-with-Office-365">phishing scams</a>, <a href="http://www.fico.com/en/newsroom/fico-machine-learning-algorithms-improve-by-30-percent">credit card fraud</a> and <a href="https://www.inc.com/jessica-stillman/heres-how-to-spot-fake-online-reviews-with-90-perc.html">fake product reviews</a>. It works best when there are large amounts of data in which to identify patterns that can reliably separate normal, benign behavior from malicious activity. For example, if people use their email systems to report as spam large numbers of messages that contain the words “urgent,” “investment” and “payment,” then a machine learning algorithm will be more likely to label as spam future messages including those words.</p>
<p>Detecting abusive posts and comments on social media is a similar problem: An algorithm would look for text patterns that are correlated with abusive or nonabusive behavior. This is faster than reading every comment, more flexible than simply performing keyword searches for slurs and more proactive than waiting for complaints. In addition to the text itself, there are often <a href="https://doi.org/10.1007/978-3-642-31284-7_27">clues from context</a>, including the user who posted the content and their other actions. A verified Twitter account with a million followers would likely be treated differently than a newly created account with no followers.</p>
<p>Yet as those algorithms are developed, abusers adapt, changing their patterns of behavior to avoid detection. Since the dawn of letter substitution in email spam, every new medium has spawned its own version: People buy <a href="https://www.cjr.org/innovations/twitter-fake-follower-accounts.php">Twitter followers</a>, favorable <a href="https://techcrunch.com/2016/10/27/amazon-sues-more-sellers-for-buying-fake-reviews/">Amazon reviews</a> and <a href="https://www.facebook.com/business/a/page/fake-likes">Facebook likes</a>, all to fool algorithms and other humans into thinking they’re more reputable. </p>
<p>As a result, a big piece of detecting abuse involves creating a stable definition of what is a problem, even as the actual text expressing the abuse changes. This presents an opportunity for artificial intelligence to, effectively, enter an arms race against itself. If an AI system can predict what an attacker might do, it could be adapted to simulate performing that behavior. Another AI system could analyze those actions, learning to detect abusers’ efforts to sneak hate speech past the automated filters. Once both the attacker and defender can be simulated, <a href="http://www.jmlr.org/papers/volume13/brueckner12a/brueckner12a.pdf">game theory</a> can identify their best strategies in this competition.</p>
<h2>Data poisoning</h2>
<p>Abusers don’t just have to change their own behavior – by substituting different characters for letters or using words or <a href="https://www.adl.org/education/references/hate-symbols/echo">symbols in coded ways</a>. They can also change the machine learning system itself.</p>
<p>Because algorithms are trained on data generated by humans, if enough people change their behavior in particular ways, the system will learn a different lesson than its creators intended. In 2016, for instance, Microsoft unveiled “Tay,” a Twitter bot that was supposed to engage in meaningful conversations with other Twitter users. Instead, trolls <a href="https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist">flooded the bot with hateful and abusive messages</a>. As the bot analyzed that text, it began to reply in kind – and was quickly shut down.</p>
<p>It can be difficult to determine when human-generated data are causing an AI to perform poorly. When possible, the best defense is for <a href="https://towardsdatascience.com/a-gentle-introduction-to-the-discussion-on-algorithmic-fairness-740bbb469b6">humans to add constraints</a> to the system, such as <a href="https://www.technologyreview.com/s/602025/how-vector-space-mathematics-reveals-the-hidden-sexism-in-language/">removing language patterns that are considered sexist</a>. Data poisoning can also be detected by <a href="https://elie.net/blog/ai/attacks-against-machine-learning-an-overview#toc-10">measuring accuracy on a separate, curated data set</a>: If a new model performs poorly on trusted data, then that could mean the new training data are bad. Finally, poisoning can be made less effective by <a href="http://papers.nips.cc/paper/6943-certified-defenses-for-data-poisoning-attacks">removing outliers</a>, data points that are very different from the rest of the training data.</p>
<p>Of course, no machine learning system will ever be perfect. Like humans, computers should be used as part of a larger effort to fight abuse. Even email spam, a major success for machine learning, relies on more than just good algorithms: <a href="https://securityintelligence.com/understanding-the-spf-and-dkim-spam-filtering-mechanisms/">New internet communications standards</a> make it harder for spammers to hide their identities when sending messages. In addition, federal law, such as the <a href="https://www.ftc.gov/tips-advice/business-center/guidance/can-spam-act-compliance-guide-business">2003 CAN-SPAM Act</a>, sets standards for commercial email, including penalties for violations. Similarly, addressing online abuse may require new standards and policies, not just smarter artificial intelligence.</p><img src="https://counter.theconversation.com/content/95203/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Daniel Lowd receives funding from NSF, ARO, DARPA, and AFRL.</span></em></p>It could seem attractive to try to teach computers to detect harassment, threats and abusive language. But it’s much more difficult than it might appear.Daniel Lowd, Associate Professor of Computer and Information Science, University of OregonLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/940782018-03-30T11:03:22Z2018-03-30T11:03:22ZHow Cambridge Analytica’s Facebook targeting model really worked – according to the person who built it<figure><img src="https://images.theconversation.com/files/212677/original/file-20180329-189824-1lbooac.jpg?ixlib=rb-1.1.0&rect=7%2C1197%2C4971%2C3002&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">How accurately can you be profiled online?</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-vector/laptop-shooting-target-arrows-on-screen-795280663">Andrew Krasovitckii/Shutterstock.com</a></span></figcaption></figure><p>The researcher whose work is at the center of the <a href="https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html">Facebook-Cambridge Analytica data analysis and political advertising uproar</a> has revealed that his method worked much like the one <a href="https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429">Netflix uses to recommend movies</a>. </p>
<p>In an email to me, Cambridge University scholar Aleksandr Kogan explained how his statistical model processed Facebook data for Cambridge Analytica. The accuracy he claims suggests it works about as well as <a href="https://www.cambridge.org/core/books/hacking-the-electorate/C0D269F47449B042767A51EC512DD82E">established voter-targeting methods</a> based on demographics like race, age and gender.</p>
<p>If confirmed, Kogan’s account would mean the digital modeling Cambridge Analytica used was <a href="https://www.youtube.com/watch?v=APqU_EJ5d3U">hardly the virtual crystal ball</a> <a href="https://techcrunch.com/2018/03/23/facebook-knows-literally-everything-about-you/">a few have claimed</a>. Yet the numbers Kogan provides <a href="https://civichall.org/civicist/will-the-real-psychometric-targeters-please-stand-up/">also show</a> what is – and isn’t – <a href="https://www.washingtonpost.com/news/monkey-cage/wp/2018/03/23/four-and-a-half-reasons-not-to-worry-that-cambridge-analytica-skewed-the-2016-election/">actually possible</a> by <a href="https://www.wired.com/story/the-noisy-fallacies-of-psychographic-targeting/">combining personal data</a> <a href="https://www.nbcnews.com/politics/politics-news/cambridge-analytica-s-effectiveness-called-question-despite-alleged-facebook-data-n858256">with machine learning</a> for political ends.</p>
<p>Regarding one key public concern, though, Kogan’s numbers suggest that information on users’ personalities or “<a href="https://www.vox.com/science-and-health/2018/3/23/17152564/cambridge-analytica-psychographic-microtargeting-what">psychographics</a>” was just a modest part of how the model targeted citizens. It was not a personality model strictly speaking, but rather one that boiled down demographics, social influences, personality and everything else into a big correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to have created a valuable campaign tool, even if the product being sold wasn’t quite as it was billed.</p>
<h2>The promise of personality targeting</h2>
<p>In the wake of the revelations that Trump campaign consultants Cambridge Analytica used <a href="https://www.nytimes.com/2018/03/17/us/politics/cambridge-analytica-trump-campaign.html">data from 50 million Facebook users</a> to target digital political advertising during the 2016 U.S. presidential election, Facebook has <a href="https://www.nasdaq.com/symbol/fb/stock-report">lost billions in stock market value</a>, governments on <a href="https://www.theverge.com/2018/3/19/17141138/facebook-cambridge-analytica-uk-authorities-warrant-data-breach">both sides of the Atlantic</a> have <a href="https://www.pbs.org/newshour/politics/federal-trade-commission-to-investigate-facebook-as-companys-stock-value-sinks">opened investigations</a>, and a nascent <a href="https://theconversation.com/facebook-is-killing-democracy-with-its-personality-profiling-data-93611">social movement</a> is calling on users to <a href="https://twitter.com/search?q=%23deletefacebook">#DeleteFacebook</a>.</p>
<p>But a key question has remained unanswered: Was Cambridge Analytica really able to effectively target campaign messages to citizens based on their personality characteristics – or even their “<a href="https://www.theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election">inner demons</a>,” as a company whistleblower alleged? </p>
<p>If anyone would know what Cambridge Analytica did with its massive trove of Facebook data, it would be Aleksandr Kogan and Joseph Chancellor. It was <a href="https://www.reuters.com/article/us-facebook-cambridge-analytica/trump-consultants-harvested-data-from-50-million-facebook-users-reports-idUSKCN1GT02Y">their startup Global Science Research</a> that collected profile information from <a href="https://www.wired.com/story/cambridge-analytica-50m-facebook-users-data/">270,000 Facebook users and tens of millions of their friends</a> using a personality test app called “thisisyourdigitallife.”</p>
<p>Part of <a href="https://scholar.google.com/citations?user=igL-0AsAAAAJ&hl=en">my own research</a> focuses on understanding <a href="https://doi.org/10.1177/0002716215570279">machine learning</a> methods, and <a href="https://www.amazon.com/Internet-Trap-Monopolies-Undermines-Democracy/dp/0691159262/">my forthcoming book</a> discusses how digital firms use recommendation models to build audiences. I had a hunch about how Kogan and Chancellor’s model worked.</p>
<p>So I emailed Kogan to ask. Kogan is still a <a href="https://www.bloomberg.com/news/articles/2018-03-20/meet-the-psychologist-at-the-center-of-facebook-s-data-scandal">researcher at Cambridge University</a>; his collaborator <a href="https://www.theguardian.com/news/2018/mar/18/facebook-cambridge-analytica-joseph-chancellor-gsr">Chancellor now works at Facebook</a>. In a remarkable display of academic courtesy, Kogan answered. </p>
<p>His response requires some unpacking, and some background.</p>
<h2>From the Netflix Prize to “psychometrics”</h2>
<p>Back in 2006, when it was still a DVD-by-mail company, Netflix offered a <a href="https://www.netflixprize.com/">reward of $1 million</a> to anyone who developed a better way to make predictions about users’ movie rankings than the company already had. A surprise top competitor was an <a href="https://www.kdnuggets.com/news/2007/n08/3i.html">independent software developer using the pseudonym Simon Funk</a>, whose basic approach was ultimately incorporated into all the top teams’ entries. Funk adapted a technique called “<a href="http://www.aclweb.org/anthology/E06-1013">singular value decomposition</a>,” condensing users’ ratings of movies into a <a href="https://www.youtube.com/watch?v=P5mlg91as1c">series of factors or components</a> – essentially a set of inferred categories, ranked by importance. As Funk <a href="http://sifter.org/simon/journal/20061027.2.html">explained in a blog post</a>,</p>
<blockquote>
<p>“So, for instance, a category might represent action movies, with movies with a lot of action at the top, and slow movies at the bottom, and correspondingly users who like action movies at the top, and those who prefer slow movies at the bottom.”</p>
</blockquote>
<p>Factors are artificial categories, which are not always like the kind of categories humans would come up with. The <a href="http://sifter.org/simon/journal/20061027.2.html">most important factor in Funk’s early Netflix model</a> was defined by users who loved films like “Pearl Harbor” and “The Wedding Planner” while also hating movies like “Lost in Translation” or “Eternal Sunshine of the Spotless Mind.” His model showed how machine learning can find correlations among groups of people, and groups of movies, that humans themselves would never spot.</p>
<p>Funk’s general approach used the 50 or 100 most important factors for both users and movies to make a decent guess at how every user would rate every movie. This method, often called <a href="https://en.wikipedia.org/wiki/Dimensionality_reduction">dimensionality reduction</a> or matrix factorization, was not new. Political science researchers had shown that <a href="https://en.wikipedia.org/wiki/NOMINATE_(scaling_method)">similar techniques using roll-call vote data</a> could predict the votes of members of Congress with 90 percent accuracy. In psychology the “<a href="https://doi.org/10.1037/0003-066X.48.1.26">Big Five</a>” model had also been used to predict behavior by clustering together personality questions that tended to be answered similarly.</p>
<p>Still, Funk’s model was a big advance: It allowed the technique to work well with huge data sets, even those with lots of missing data – like the Netflix dataset, where a typical user rated only few dozen films out of the thousands in the company’s library. More than a decade after the Netflix Prize contest ended, <a href="https://doi.org/10.1145/1401890.1401944">SVD-based methods</a>, or <a href="https://doi.org/10.1109/ICDM.2008.22">related models for implicit data</a>, are still the tool of choice for many websites to predict what users will read, watch, or buy. </p>
<p>These models can predict other things, too.</p>
<h2>Facebook knows if you are a Republican</h2>
<p>In 2013, Cambridge University researchers Michal Kosinski, David Stillwell and Thore Graepel published an article on the <a href="https://doi.org/10.1073/pnas.1218772110">predictive power of Facebook data</a>, using information gathered through an online personality test. Their initial analysis was nearly identical to that used on the Netflix Prize, using SVD to categorize both users and things they “liked” into the top 100 factors. </p>
<p>The paper showed that a factor model made with users’ Facebook “likes” alone was <a href="https://doi.org/10.1073/pnas.1218772110">95 percent accurate</a> at distinguishing between black and white respondents, 93 percent accurate at distinguishing men from women, and 88 percent accurate at distinguishing people who identified as gay men from men who identified as straight. It could even correctly distinguish Republicans from Democrats 85 percent of the time. It was also useful, though not as accurate, for <a href="https://doi.org/10.1073/pnas.1218772110">predicting users’ scores</a> on the “Big Five” personality test. </p>
<p>There was <a href="https://psmag.com/economics/big-data-big-brother-and-the-like-button-53894">public outcry</a> <a href="https://www.theatlantic.com/technology/archive/2013/03/armed-with-facebook-likes-alone-researchers-can-tell-your-race-gender-and-sexual-orientation/273963/">in response</a>; within weeks Facebook had <a href="https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win">made users’ likes private</a> by default.</p>
<p>Kogan and Chancellor, also Cambridge University researchers at the time, were starting to use Facebook data for election targeting as part of a collaboration with Cambridge Analytica’s parent firm SCL. Kogan invited Kosinski and Stillwell to join his project, but it <a href="https://www.theguardian.com/education/2018/mar/24/cambridge-analytica-academics-work-upset-university-colleagues">didn’t work out</a>. Kosinski reportedly suspected Kogan and Chancellor might have <a href="https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win">reverse-engineered the Facebook “likes” model</a> for Cambridge Analytica. Kogan denied this, saying his project “<a href="https://www.theguardian.com/education/2018/mar/24/cambridge-analytica-academics-work-upset-university-colleagues">built all our models</a> using our own data, collected using our own software.” </p>
<h2>What did Kogan and Chancellor actually do?</h2>
<p>As I followed the developments in the story, it became clear Kogan and Chancellor had indeed collected plenty of their own data through the thisisyourdigitallife app. They certainly could have built a predictive SVD model like that featured in Kosinski and Stillwell’s published research.</p>
<p>So I emailed Kogan to ask if that was what he had done. Somewhat to my surprise, he wrote back. </p>
<p>“We didn’t exactly use SVD,” he wrote, noting that SVD can struggle when some users have many more “likes” than others. Instead, Kogan explained, “The technique was something we actually developed ourselves … It’s not something that is in the public domain.” Without going into details, Kogan described their method as “a multi-step <a href="https://www.quora.com/What-is-a-co-occurrence-matrix">co-occurrence</a> approach.” </p>
<p>However, his message went on to confirm that his approach was indeed similar to SVD or other matrix factorization methods, like in the Netflix Prize competition, and the Kosinki-Stillwell-Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model. </p>
<h2>How accurate was it?</h2>
<p>Kogan suggested the exact model used doesn’t matter much, though – what matters is the accuracy of its predictions. According to Kogan, the “correlation between predicted and actual scores … was around [30 percent] for all the personality dimensions.” By comparison, a person’s previous Big Five scores are about <a href="https://doi.org/10.1016/j.jrp.2014.06.003">70 to 80 percent accurate</a> in predicting their scores when they retake the test. </p>
<p>Kogan’s accuracy claims cannot be independently verified, of course. And anyone in the midst of such a high-profile scandal might have incentive to understate his or her contribution. In his <a href="https://www.youtube.com/watch?v=APqU_EJ5d3U">appearance on CNN</a>, Kogan explained to a increasingly incredulous Anderson Cooper that, in fact, the models had actually not worked very well. </p>
<figure>
<iframe width="440" height="260" src="https://www.youtube.com/embed/APqU_EJ5d3U?wmode=transparent&start=0" frameborder="0" allowfullscreen=""></iframe>
<figcaption><span class="caption">Aleksandr Kogan answers questions on CNN.</span></figcaption>
</figure>
<p>In fact, the accuracy Kogan claims seems a bit low, but plausible. Kosinski, Stillwell and Graepel reported comparable or slightly better results, as have several <a href="https://doi.org/10.1016/j.paid.2017.12.018">other academic studies</a> using digital footprints to predict personality (though some of those studies had more data than just Facebook “likes”). It is surprising that Kogan and Chancellor would go to the trouble of designing their own proprietary model if off-the-shelf solutions would seem to be just as accurate.</p>
<p>Importantly, though, the model’s accuracy on personality scores allows comparisons of Kogan’s results with other research. Published models with equivalent accuracy in predicting personality are all much more accurate at guessing demographics and political variables.</p>
<p>For instance, the similar Kosinski-Stillwell-Graepel SVD model was 85 percent accurate in guessing party affiliation, even without using any profile information other than likes. Kogan’s model had similar or better accuracy. Adding even a small amount of information about friends or users’ demographics would likely boost this accuracy above 90 percent. Guesses about gender, race, sexual orientation and other characteristics would probably be more than 90 percent accurate too.</p>
<p>Critically, these guesses would be especially good for the most active Facebook users – the people the model was primarily used to target. Users with less activity to analyze are likely not on Facebook much anyway. </p>
<h2>When psychographics is mostly demographics</h2>
<p>Knowing how the model is built helps explain Cambridge Analytica’s apparently contradictory statements about <a href="https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win">the role</a> – or <a href="https://www.c-span.org/video/?420077-1/google-hosts-post-election-review&start=6905">lack thereof</a> – that personality profiling and psychographics played in its modeling. They’re all technically consistent with what Kogan describes.</p>
<p>A model like Kogan’s would give estimates for every variable available on any group of users. That means it would automatically <a href="https://www.bloomberg.com/news/features/2015-11-12/is-the-republican-party-s-killer-data-app-for-real-">estimate the Big Five personality scores</a> for every voter. But these personality scores are the output of the model, not the input. All the model knows is that certain Facebook likes, and certain users, tend to be grouped together. </p>
<p>With this model, Cambridge Analytica could say that it was identifying people with low openness to experience and high neuroticism. But the same model, with the exact same predictions for every user, could just as accurately claim to be identifying less educated older Republican men. </p>
<p>Kogan’s information also helps clarify the confusion about whether Cambridge Analytica <a href="https://www.youtube.com/watch?v=MepM_YXZdYg">actually deleted its trove</a> of Facebook data, when models built from the data <a href="https://www.channel4.com/news/revealed-cambridge-analytica-data-on-thousands-of-facebook-users-still-not-deleted">seem to still be circulating</a>, and even <a href="https://gizmodo.com/aggregateiq-created-cambridge-analyticas-election-softw-1824026565">being developed further</a>. </p>
<p>The whole point of a dimension reduction model is to mathematically represent the data in simpler form. It’s as if Cambridge Analytica took a very high-resolution photograph, resized it to be smaller, and then deleted the original. The photo still exists – and as long as Cambridge Analytica’s models exist, the data effectively does too.</p><img src="https://counter.theconversation.com/content/94078/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Matthew Hindman does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>An email from Aleksandr Kogan sheds light on exactly how much your Facebook data reveals about you, and what data scientists can actually do with that information.Matthew Hindman, Associate Professor of Media and Public Affairs, George Washington UniversityLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/938732018-03-23T12:48:09Z2018-03-23T12:48:09ZCambridge Analytica: the data analytics industry is already in full swing<figure><img src="https://images.theconversation.com/files/211703/original/file-20180323-54863-gb98ni.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption"></span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/african-businesswoman-analyzing-statistics-on-laptop-1022439592?src=6Dcwg1AUq65LI__PxaiOZQ-1-1">Shutterstock</a></span></figcaption></figure><p>Revelations about <a href="https://www.ft.com/video/517a016d-642e-4a67-94f5-214dffd96a14">Cambridge Analytica</a> have laid bare the seeming lack of control that we have over our own data. Suddenly, with all the talk of “<a href="https://theconversation.com/psychographics-the-behavioural-analysis-that-helped-cambridge-analytica-know-voters-minds-93675">psychographics</a>” and voter manipulation, the power of data analytics has become the source of some concern. </p>
<p>But the risk is that if we look at the case of Cambridge Analytica in isolation, we might prevent a much wider debate about the use and control of our data. By focusing on the reports of <a href="https://www.channel4.com/news/exposed-undercover-secrets-of-donald-trump-data-firm-cambridge-analytica">extreme practices</a>, we might miss the many everyday ways that data analytics are now shaping our lives.</p>
<p>The <a href="https://www.tandfonline.com/doi/abs/10.1080/1369118X.2017.1289232">data analytics industry</a> is much more diverse and far-reaching than the current news coverage might lead us to believe. During a recent project, I found something quite different to the reports that we are now seeing about Cambridge Analytica.</p>
<p>Despite having its origins in the 1970s, when computer scientists and processing experts were beginning to try to imagine what a data-informed organisation might look like, it wasn’t until the 1990s that the data analytics industry began to really develop. Some of the most famous early examples of the organisational and individual application of data analytics were in sport, and particularly <a href="http://journals.sagepub.com/doi/abs/10.1177/2053951715578951">in football</a>, where data was gathered to try to enhance performance levels, to find hidden patterns within games or to spot potential talent. </p>
<figure class="align-center ">
<img alt="" src="https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=400&fit=crop&dpr=1 600w, https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=400&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=400&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=502&fit=crop&dpr=1 754w, https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=502&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/211705/original/file-20180323-54893-7rggj0.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=502&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px">
<figcaption>
<span class="caption">Improving performance levels.</span>
<span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/pathumthanithailand-sep14-details-ball-during-thai-154249301?src=IQfoE4EcZH2Ke29OBlSLSg-1-33">Shutterstock</a></span>
</figcaption>
</figure>
<p>Beyond this, the use of data in different sectors has spread drastically in the last 20 years, most markedly in the fields of performance management, advertising and marketing, as well as some notable developments in <a href="http://www.antoniocasella.eu/nume/Amoore_2006.pdf">security and risk</a>. This has included things like workplace <a href="http://www.talentmetrics.io/">talent metrics</a> and <a href="https://us.sagepub.com/en-us/nam/the-predictive-postcode/book254638">postcode level classfications</a>, through to the use of data about lifestyles <a href="https://www.marketwatch.com/story/could-your-fitbit-data-raise-the-cost-of-your-health-insurance-2017-02-23">to fix insurance premiums</a> or in <a href="https://www.clearscore.com/">credit scoring</a>. The increasing harvesting of data – enabled by the new infrastructures of GPS, <a href="https://electronics.howstuffworks.com/gadgets/high-tech-gadgets/rfid.htm">RFID sensors</a>, internet shopping, smartphones and social media – has created a range of new opportunities for data harvesting. As the data began to pile up, a burgeoning industry emerged.</p>
<h2>You as a data analyst</h2>
<p>As might be expected, some of these data analytics providers offer consultancy and analytics services, helping to track customers, brands, public opinion and the like. What seemed much more important to me, though, was that a good portion of this industry was instead focused on providing the software and tools to, as they put it, turn us into our own data analysts. </p>
<p>Many of these tools utilise and adapt The Apache Software Foundation’s open-source <a href="http://hadoop.apache.org/">Hadoop project</a> – which allows for large clusters of computers to be used to process data. These tools are usually presented as accessible dashboard-style technologies that require little technical skill or know-how of the user. The result is that data can be harvested and used in many different contexts and by a much wider range of people than we might imagine. </p>
<p>Measuring a wide range of things, from sentiment, buzz, amplification and influence in social media, to taste profiling, social network formations and so on, these tools come with some glossy promises. The point here is that the practice of data analysis is not restricted to qualified technical experts. The data analytics industry is actually aiming to turn anyone into a data analyst and to make all organisations data savvy.</p>
<p>Of course, data analytics come with some powerful promises designed to make us evermore data focused. We are told <a href="https://www.tandfonline.com/doi/abs/10.1080/1369118X.2017.1289232">repeatedly by this industry</a> that data can speed us up, make us smarter, allow us to see into the hidden depths of organisations, allow us to act in real-time or enable us to predict the future. This means that we also need to be cautious about accepting claims that are made about the capabilities of data analytics, and which are invested with an agenda aimed at <a href="https://discoversociety.org/2015/07/30/the-growing-power-of-the-data-analytics-industry/">expanding data use</a>.</p>
<p>If we want a full and comprehensive debate about the role of data in our lives, we need to first appreciate that the analysis and use of our data is not restricted to the types of figures that we have been reading about in these recent stories – it is deeply embedded in the structures in which we live. </p>
<p>This calls for us to reflect on how our data is used and how data analysis, afforded by these new tools, is coming to shape our lives in lots of ways. Data analysis is now an embedded presence that branches out into everything from <a href="https://www.palgrave.com/gb/book/9781137353979">local government</a>, large corporations and SMEs, to political parties, <a href="https://codeactsineducation.wordpress.com/about/">school governance</a>, PR and management consultancy.</p>
<p>The Cambridge Analytica case is crucial in understanding the way that our data are being used, but the opportunity these revelations offer for reflection shouldn’t be restricted solely to this type of reported misuse. We should look beyond that to try to understand how data-led approaches are influencing our lives on lots of different fronts, especially as the tools of data analysis are taken up in numerous different sectors. Just because the rest of the industry may not be as extreme as Cambridge Analytica, it does not mean that we should neglect to ask questions about the many ways that our data are being used to judge, rank and order our lives.</p><img src="https://counter.theconversation.com/content/93873/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>David Beer does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Noise around extreme practices drowns out how data analytics is being used in everyday ways. To really consider control of our data we must look beyond Cambridge Analytica.David Beer, Reader in Sociology, University of YorkLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/885812017-12-06T13:35:21Z2017-12-06T13:35:21ZEngineering research in Africa is growing but it’s still a patchy picture<figure><img src="https://images.theconversation.com/files/197773/original/file-20171205-22967-swwkhw.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Engineering can greatly bolster any country's development and growth.</span> <span class="attribution"><span class="source">Gorodenkoff/Shutterstock</span></span></figcaption></figure><p>Africa’s vast land mass and rich natural and mineral resources make it strategically important and an increasingly significant <a href="https://na.unep.net/atlas/africa/downloads/chapters/Africa_Atlas_English_Intro.pdf">global player</a>. It is also a dynamic young continent: about 60% of its residents are aged <a href="http://www.bbc.com/news/world-africa-34188248">below 25</a>.</p>
<p>The African Union is trying to harness this enormous potential through its <a href="https://au.int/en/agenda2063">Agenda 2063</a>, which includes elevating Africa through improved education and application of science and technology in development. </p>
<p>Engineering is an important branch of science and technology. It has a significant impact on the overall development of any nation, region or continent. It is, as Professor Calestous Juma <a href="http://www.nation.co.ke/oped/opinion/Engineering-is-the-engine-that-will-power-Africa-s-growth-/440808-2309528-pq151w/index.html">has written</a>, an engine to power growth – especially in Africa.</p>
<p>The World Bank predicts that Africa needs to spend about <a href="http://www.nation.co.ke/oped/opinion/Engineering-is-the-engine-that-will-power-Africa-s-growth-/440808-2309528-pq151w/index.html">USD$93 billion per year</a> in the coming years to improve its infrastructure. Part of this investment must be in world class engineering education and research.</p>
<p>Given the discipline’s importance, I wanted to understand how Africa is performing in terms of engineering research. How much are the continent’s researchers contributing to new ideas and thinking around engineering? To find out, I <a href="http://www.tandfonline.com/doi/full/10.1080/20421338.2017.1341732">searched, downloaded and analysed</a> scholarly publication data from academic publisher Elsevier’s citation and abstracting service, <a href="https://www.scopus.com/">Scopus</a>®. It’s a huge index of articles, covering 22,800 journals belonging to more than 5,000 international publishers across disciplines. </p>
<p>I also examined how many times articles from Africa were being cited, which is crucial to map the relevance and impact of any research. For instance, one of the criteria for winning a <a href="http://www.lindau-nobel.org/blog-on-fundamental-science/">Nobel Prize</a> in science is how frequently a researcher’s work has been cited.</p>
<p>The data I analysed shows that scholarly research output in terms of journal articles, conference papers and so on in engineering fields from Africa has increased over the past two decades. The number remains small in comparison to other, more developed continents and countries. But the continent’s contribution to global thinking and understanding about engineering is growing, and this should be celebrated.</p>
<h2>Analysing data</h2>
<p>My analysis reveals that Africa has recorded a tremendous growth in its output of academic engineering research over the past 20 years. In total, 75,157 scholarly articles about engineering subjects emerged from Africa between 1996 and 2016. About 1,500 of these were published in the first seven years under review. In the past three years, about 9,000 engineering articles from Africa were published annually. That’s a significant percentage increase.</p>
<figure class="align-left zoomable">
<a href="https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=397&fit=crop&dpr=1 600w, https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=397&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=397&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=498&fit=crop&dpr=1 754w, https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=498&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/197855/original/file-20171205-23009-1tqh4ia.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=498&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Africa’s engineering research output over 20 years.</span>
<span class="attribution"><a class="source" href="http://www.scimagojr.com/countryrank.php">Scimago</a></span>
</figcaption>
</figure>
<p>The problem is that African countries’ outputs are not uniform. South Africa leads the pack, with 22,156 articles over 20 years. This puts it at 41st in the world for output in engineering research. It is followed by Algeria (16,617 articles) and Tunisia (14,805 articles). Some countries have barely contributed to engineering research: Cape Verde produced only nine articles in 20 years; the Central African Republic just seven and Somalia only six.</p>
<p>The continent is also not producing nearly as much engineering research as others and other regions.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=480&fit=crop&dpr=1 600w, https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=480&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=480&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=603&fit=crop&dpr=1 754w, https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=603&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/197935/original/file-20171206-910-pqurdu.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=603&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Africa’s engineering research output is still lower than other continents and regions, but its growth over 20 years has been encouraging.</span>
<span class="attribution"><span class="source">Scimago</span></span>
</figcaption>
</figure>
<p>I also wanted to know how often African researchers’ work was being cited by others. This is a good way to understand the impact a piece of research has, and is called citation analysis. It counts the number of times an author’s article is cited in other scholarly works. And <a href="https://medium.com/@write4research/why-are-citations-important-in-research-writing-97fb6d854b47">citations are important</a> because they reveal that a piece of research is being used by others.</p>
<p><a href="http://www.tandfonline.com/doi/full/10.1080/20421338.2017.1341732">The results</a> are encouraging. The average citation for academic engineering papers from Africa is 5.48 per paper. This is almost equal to the performance of papers from Asia, and is well above the average citations received by papers from Eastern Europe. This suggests that African engineering research is influencing others’ thinking and contributing to global knowledge about the discipline.</p>
<p>So how can Africa improve its engineering research output, especially with an eye to meeting the goals of Agenda 2063? Collaboration will be crucial.</p>
<h2>Collaborative work</h2>
<p>South Africa does well with collaboration. Articles from the country tend to involve more than one research organisation or institution. Co-authored articles are common. Its researchers work with others on the continent and with global partners. Countries in North Africa, however, are less active when it comes to collaboration. </p>
<p>Africa-Africa collaboration, involving institutions and individuals across the continent, needs to be strengthened. This is because only African countries can truly understand the continent’s pressing needs, and develop appropriate solutions. Countries like South Africa that perform well collaboratively can offer support and advice to others. </p>
<p>It may also be time to set up an exclusively African citation database. Even Scopus®, the world’s largest indexing and abstracting database, offers very limited coverage of African science. By developing a resource that focuses only on African engineering research, the continent will be able to get a more complete, clear picture of its output and respond accordingly. The Council for the Development of Social Science Research in Africa is creating an <a href="http://africancitationindex.org/">African Citation database</a>, but it will be some time before this is a fully fledged searchable database.</p><img src="https://counter.theconversation.com/content/88581/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Swapan Kumar Patra receives funding from National Research Foundation, Republic of South Africa, Post doctoral research fellowship, through Tshwane University of Technology</span></em></p>Africa has recorded a tremendous growth in its output of academic engineering research over the past 20 years. Greater collaboration can increase this growth even more.Swapan Kumar Patra, Tshwane University of TechnologyLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/875352017-11-21T14:30:08Z2017-11-21T14:30:08ZAfrica must keep its rich, valuable data safe from exploitation<figure><img src="https://images.theconversation.com/files/195178/original/file-20171117-7588-k3w20r.jpg?ixlib=rb-1.1.0&rect=70%2C52%2C824%2C802&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Data should be open, shareable - but not at the expense of African researchers and communities.</span> <span class="attribution"><span class="source">Shutterstock</span></span></figcaption></figure><p>There’s a data revolution underway in Africa. It’s being driven by major international research collaborations like the <a href="https://skatelescope.org/">Square Kilometre Array</a> (SKA) project. This and similar initiatives are producing volumes of data the continent has never witnessed before.</p>
<p>All of that data then needs to be carefully managed throughout every stage of the research project. That’s why <a href="https://www.ncbi.nlm.nih.gov/books/NBK215270/">data stewardship</a> – a job that didn’t exist in academia ten years ago – has today become the key to the integrity of any academic research enterprise. </p>
<p>Data stewardship refers to the person or people in an organisation responsible for describing data accurately, then arranging it so it’s easily found, understood in context and, ultimately, used appropriately.</p>
<p>High-profile projects like the SKA are supported by important national infrastructure initiatives, such as South Africa’s <a href="https://www.csir.co.za/national-integrated-cyber-infrastructure-system">National Integrated Cyber Infrastructure System</a>. These help to boost the country’s capacity for high levels of research data management.</p>
<p>The continent’s universities are also scrambling to provide necessary data services to researchers. This is important to make sure that academics comply with international funding agencies’ complex data management requirements. But there are more than technical or operational considerations to managing and sharing the huge volumes of data being sourced in Africa. </p>
<p>Political, ideological, cultural and historical factors also matter. Data is emerging as a powerful force in the digital economy. Will other nations and regions try to control the flow of information from Africa? That’s <a href="http://www.sahistory.org.za/article/term-3-scramble-africa-late-19th-century">what happened</a> the last time Africa had something valuable to offer in the form of oil reserves and minerals. </p>
<p>Africa must develop its capacity for data stewardship. This is a critical resource to refine the data according to “<a href="https://www.force11.org/group/fairgroup/fairprinciples">FAIR</a>” data principles outlined by global bodies. These call for data to be open, shareable and reusable – an important way to prevent the exploitation of Africa’s research data.</p>
<h2>Opening up access</h2>
<p>The <a href="http://www.africa.undp.org/content/rba/en/home/library/reports/the_africa_data_revolution_report_2016.html">Africa Data Revolution Report 2016</a>, backed by the United Nations Development Programme, argues that in the African context open data means not only sharing and reuse: it also requires inclusion. This means that the benefits of gathering and sharing data should accrue to all, from institutions to individual researchers and entire communities.</p>
<p>That principle differs sharply from the historical paradigms of data production, dissemination and usage in Africa. Census planning was an early example of data gathering on the continent. Far from being a neutral act, this <a href="https://www.ncbi.nlm.nih.gov/pubmed/12321499">data was used</a> to construct ideologies of race. It became a tool for exclusion and segregation, especially under colonial and apartheid rule.</p>
<p>This history explains why many researchers in Africa are now committing themselves to this principle of openness. South Africa’s research community is particularly sensitive to the benefits of sharing data openly to promote social, economic and political inclusion and the integration of marginalised
communities. </p>
<p>Those who <a href="https://icsu.org/cms/2017/04/open-data-in-big-data-world_long.pdf">support open data</a> see that it drives greater scientific integrity, global participation. They understand that it enables a strategic response to Africa’s societal challenges. The continent’s public health researchers and epidemiologists are <a href="https://www.nia.nih.gov/sites/default/files/2017-06/sharing-research-data-to-improve-public-health-in-africa_0.pdf">leading the way</a> here.</p>
<p>But of course, researchers have their reservations too. South Africa’s academics insist on the right to make the decision whether to share their data openly – and where to share it. Few universities have developed policies on research data management. These are necessary to guide research communities in collecting good, standardised data that can be shared at the end of a research project. </p>
<p>Another concern among African scholars is the problem of “<a href="https://theconversation.com/global-academic-collaboration-a-new-form-of-colonisation-61382">helicopter science</a>”. The risk in international research collaborations is that non-African partners tend to drive the research agenda. They gather uniquely African data and then export it for analysis and publishing elsewhere. The African partners then lose out on research incentives like peer recognition and reward. They also can’t, for instance, patent products based on that relinquished data in future.</p>
<p>These concerns must be taken seriously as Africa continues its data drive. A focus on collaboration among African universities and research institutions is crucial in developing national policies that both meet the FAIR principles of Open Data and ensure equity and fairness in research contracts. All of this work will ultimately offer greater protection against the risk of “helicopter science”.</p>
<h2>A collaborative ‘cloud’</h2>
<p>One example of this sort of crucial collaboration is work that’s been undertaken by Data Intensive Research Initiatives of South Africa <a href="https://www.dirisa.ac.za/">DIRISA</a>. The organisation plans to develop a shared data service from core funding awarded to a consortium of universities in the Western Cape province. </p>
<p>This consortium, established in late 2016, is known as <a href="http://www.researchsupport.uct.ac.za/ilifu">ILIFU</a>, a word which means cloud in isiXhosa. Part of the ILIFU project includes the deployment of the <a href="https://figshare.com/">cloud-based Figshare platform</a>. This offers an institutional repository for research data. It serves researchers who need a place to store and disseminate their data with discrimination. </p>
<p>The project is South Africa’s first national data infrastructure grant. It will give more access to research infrastructure, software and data to all the country’s researchers. That includes those from under-resourced communities, where access to this kind of infrastructure should “leave no one behind”.</p>
<p>The opportunity to work collaboratively in providing shared data infrastructure heralds another conscious mind shift for African research. We are beginning to see open data not as a commodity but as a source of renewable energy. It generates new value every time it’s reused – and, ultimately, it can power the world.</p><img src="https://counter.theconversation.com/content/87535/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Dale Peters does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>A focus on collaboration among African universities and research institutions is crucial in developing national policies that meet the principles of open data while keeping it safe from exploitation.Dale Peters, Director: UCT eResearch, University of Cape TownLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/809972017-07-19T17:01:16Z2017-07-19T17:01:16ZHere’s the three-pronged approach we’re using in our own research to tackle the reproducibility issue<figure><img src="https://images.theconversation.com/files/178674/original/file-20170718-31872-1uv1xdv.JPG?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Step one is not being afraid to reexamine a site that's been previously excavated.</span> <span class="attribution"><span class="source">Dominic O'Brien. Gundjeihmi Aboriginal Corporation</span>, <a class="license" href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></span></figcaption></figure><p>If you keep up with health or science news, you’ve probably been whipsawed between conflicting reports. Just days apart you may hear that “science says” coffee’s good for you, no actually it’s bad for you, actually red wine holds the secret to long life. As <a href="https://www.youtube.com/watch?v=0Rnq1NpHdmw">comedian John Oliver put it</a>:</p>
<blockquote>
<p>“After a certain point, all that ridiculous information can make you wonder: is science bullshit? To which the answer is clearly no. But there is a lot of bullshit currently masquerading as science.”</p>
</blockquote>
<p>A big part of this problem has to do with what’s been called a “<a href="https://theconversation.com/us/topics/reproducibility-5484">reproducibility crisis</a>” in science – many studies if run a second time don’t come up with the same results. <a href="https://doi.org/10.1038/533452a">Scientists are worried</a> about this situation, and <a href="https://www.nature.com/collections/byblhcfwhw">high-profile</a> international <a href="https://doi.org/10.1126/science.aab2374">research journals</a> have raised the alarm, too, calling on researchers to put more effort into ensuring their results can be reproduced, rather than only striving for splashy, one-off outcomes.</p>
<p><a href="https://www.nytimes.com/2016/05/29/opinion/sunday/why-do-so-many-studies-fail-to-replicate.html">Concerns about</a> <a href="https://www.theatlantic.com/science/archive/2016/03/psychologys-replication-crisis-cant-be-wished-away/472272/">irreproducible results</a> <a href="http://www.slate.com/articles/health_and_science/future_tense/2016/04/biomedicine_facing_a_worse_replication_crisis_than_the_one_plaguing_psychology.html">in science resonate</a> <a href="https://fivethirtyeight.com/features/science-isnt-broken/">outside the ivory tower</a>, as well, because a lot of this research translates into information that affects our everyday lives. </p>
<p>For example, it informs what we know about how to stay healthy, how doctors should look after us when we’re sick, how best to educate our children and how to organize our communities. If study results are not reproducible, then we can’t trust them to give good advice on solving our everyday problems – and society-wide challenges. Reproducibility is not just a minor technicality for specialists; it’s a pressing issue that affects the role of modern science in society.</p>
<p>Once we’ve identified that reproducibility is a big problem, the question becomes: How do we tackle it? Part of the answer has to do with changing incentives for researchers. But there are plenty of things we in the research community can do right now in the course of our scientific work.</p>
<p>It might come as a surprise that <a href="https://doi.org/10.1007/s10816-015-9272-9">archaeologists are at the forefront</a> of finding ways to improve the situation. Our <a href="https://doi.org/10.1038/nature22968">recent paper in Nature</a> demonstrates a concrete three-pronged approach to improving the reproducibility of scientific findings.</p>
<h2>Going back to where it all started</h2>
<p>In our new publication we describe recent work at an archaeological site in northern Australia. The results of our excavations and laboratory analyses show that <a href="http://theconversation.com/buried-tools-and-pigments-tell-a-new-history-of-humans-in-australia-for-65-000-years-81021">people arrived in Australia 65,000 years ago</a>, substantially earlier than the previous consensus estimate of 47,000 years ago. <a href="http://theconversation.com/buried-tools-and-pigments-tell-a-new-history-of-humans-in-australia-for-65-000-years-81021">This date has exciting implications</a> for our understandings of human evolution.</p>
<p>A less obvious detail about this study is the care we’ve taken to make our results reproducible. Our reproducibility strategy had three parts: fieldwork, labwork and data analyses.</p>
<figure class="align-right zoomable">
<a href="https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=906&fit=crop&dpr=1 600w, https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=906&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=906&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=1138&fit=crop&dpr=1 754w, https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=1138&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/178680/original/file-20170718-10320-1sapmfd.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=1138&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Ben Marwick and colleagues excavating at Madjedbebe.</span>
<span class="attribution"><span class="source">Dominic O'Brien. Gundjeihmi Aboriginal Corporation</span>, <a class="license" href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></span>
</figcaption>
</figure>
<p>Our first step toward reproducibility was our choice of what to investigate. Rather than striking out to someplace new, we reexcavated an archaeological site <a href="https://doi.org/10.1016/j.jhevol.2015.03.014">previously known to have very old artifacts</a>.</p>
<p>The rockshelter site Madjedbebe in Australia’s Northern Territory had been excavated twice before. Famously, excavations there in 1989 indicated that people had <a href="https://doi.org/10.1038/345153a0">arrived in Australia by about 50,000 years ago</a>. But this age was not accepted by many archaeologists, who refused to accept anything older than 47,000 years ago.</p>
<p>This age was controversial from its first publication, and our goal in revisiting the site was to check if it was reliable or not. Could that controversial 50,000-years age be reproduced, or was it just a chance result that didn’t indicate the true time period for human habitation in Australia?</p>
<p>Like many scientists, archaeologists are generally less interested in returning to old discoveries, instead preferring to forge new paths in search of novel results. The problem with this is that it can lead to many unresolved questions, making it difficult to build a solid foundation of knowledge. </p>
<h2>Double-check the lab tests</h2>
<p>The second part of our reproducibility strategy was to verify that our laboratory analyses were reliable.</p>
<p>Our team used <a href="https://www.thoughtco.com/luminescence-dating-cosmic-method-171538">optically stimulated luminescence</a> methods to date the sand grains near the ancient artifacts. This method is complex, and there are only a few places in the world that have the instruments and skills to date these kinds of samples.</p>
<figure class="align-left zoomable">
<a href="https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=237&fit=clip" srcset="https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=766&fit=crop&dpr=1 600w, https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=766&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=766&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=963&fit=crop&dpr=1 754w, https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=963&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/178820/original/file-20170719-27696-r2h9i8.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=963&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Zenobia Jacobs produced the new ages for the Madjebdebe site based on her work in the Luminescence Dating Laboratory at the University of Wollongong, Australia.</span>
<span class="attribution"><span class="source">University of Wollongong</span>, <a class="license" href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></span>
</figcaption>
</figure>
<p>We first analyzed our samples in our laboratory at the <a href="http://smah.uow.edu.au/sees/facilities/UOW002889.html">University of Wollongong</a> to find their ages. Then we sent blind duplicate samples to another laboratory at the <a href="https://www.adelaide.edu.au/ipas/facilities/luminescence/">University of Adelaide</a> to analyze, without telling that lab our results. With both sets of analyses in hand, we compared them; it turned out in this case that they got the same ages as we did for the same samples.</p>
<p>This kind of verification is not a common practice in archaeology, but because this site was already controversial, we wanted to make sure the ages we obtained were reproducible.</p>
<p>While this extra work involved some additional cost and time, it’s vital to proving that our dates give the true ages of the sediments surrounding the artifacts. This verification shows that our lab results are not due to chance, or the unique conditions of our laboratory. Other archaeologists, and the public, can be more confident in our findings because we’ve taken these extra steps. This external checking should be standard practice in any science where controversial findings are at stake. </p>
<h2>Don’t let the computer be a black box</h2>
<p>After we completed the excavation and lab analyses, we analyzed the data on our computers. This stage of our research was very similar to what scientists in many other fields do. We loaded the raw data into our computers to visualize it with plots and test hypotheses with statistical methods.</p>
<p>However, while many researchers do this work by pointing and clicking using off-the-shelf software, we tried as much as possible to write scripts in the <a href="https://doi.org/10.1038/517109a">R programming language</a>.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=402&fit=crop&dpr=1 600w, https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=402&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=402&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=505&fit=crop&dpr=1 754w, https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=505&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/178686/original/file-20170718-10283-q6g5bg.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=505&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Could be the enemy of reproducibility if it helps obscure the steps in data analysis.</span>
<span class="attribution"><a class="source" href="https://www.flickr.com/photos/erinkohlenbergphoto/5353222369">Erin Kohlenberg</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>Pointing and clicking generally leaves no traces of important decisions made during data analysis. Mouse-driven analyses leave the researcher with a final result, but none of the steps to get that result is saved. This makes it <a href="https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938">difficult to retrace the steps</a> of an analysis, and check the assumptions made by the researcher.</p>
<p>On the other hand, our scripts contain a record of all our data analysis steps and decisions. They’re like an exact recipe to generate our results. Other researchers not using scripts for their data analysis don’t have these recipes, so their results are much harder to reproduce. </p>
<p>Another advantage of our choice to use scripts is that we can share them with the scientific community and the public. We follow <a href="https://doi.org/10.1038/nn.4550">standard practices</a> by making our script files and main data files <a href="https://osf.io/qwfcz/">freely available online</a> so anyone can inspect the details of our analysis, or explore new ideas using our data.</p>
<p>It’s easy to understand why many researchers prefer point-and-click over writing scripts for their data analysis. Often that’s what they were taught as students. It’s hard work and time-consuming to learn new analysis tools among the pressures of teaching, applying for grants, doing fieldwork and writing publications. Despite these challenges, there is an accelerating shift away from point-and-click toward scripted analyses in many areas of science.</p>
<h2>Combating irreproducibility one step at a time</h2>
<p>Our recent paper is part of a new movement emerging in many disciplines to improve the reproducibility of science. Examples of recent papers that have made a commitment to reproducibility similar to ours have come from <a href="https://doi.org/10.1038/nature22975">epidemiology</a>, <a href="https://doi.org/10.1038/s41559-017-0160">oceanography</a> and <a href="https://doi.org/10.7554/eLife.20470">neuroscience</a>.</p>
<p>We hope our example will inspire other scientists to be strategic about improving the reproducibility of their research. Some of these steps can be difficult for researchers: It means learning how to use unfamiliar software, and publicly sharing more of their data and methods than they’re accustomed to. But they’re important for generating reliable results – and for maintaining public confidence in scientific knowledge.</p><img src="https://counter.theconversation.com/content/80997/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Ben Marwick receives funding from the Australian Research Council, the University of Wollongong, and the University of Washington. This work was supported in part by the University of Washington eScience Institute.</span></em></p><p class="fine-print"><em><span>Zenobia Jacobs receives funding from the Australian Research Council. </span></em></p>A team of archaeologists strived to improve the reproducibility of their results, influencing their choices in the field, in the lab and during data analysis.Ben Marwick, Associate Professor of Archaeology, University of WashingtonZenobia Jacobs, Professor, University of WollongongLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/806042017-07-11T04:47:24Z2017-07-11T04:47:24ZWith better data access, urban planners could help ease our weight problems<figure><img src="https://images.theconversation.com/files/177282/original/file-20170707-3035-a0ub73.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Increasing access to health data and more readily available analytical tools offer some opportunities to tackle the ever-growing rates of obesity.</span> <span class="attribution"><span class="source">AAP/Dave Hunt</span></span></figcaption></figure><p>A recent episode of <a href="http://www.abc.net.au/tv/programs/ask-the-doctor/">ABC TV’s Ask the Doctor</a> pointed to poor urban planning as a major culprit in worsening obesity rates and associated lifestyle diseases such as diabetes. The show highlighted suburbs without footpaths, fresh-food outlets or exercise opportunities.</p>
<p>Built environments are <a href="https://theconversation.com/you-are-where-you-live-health-wealth-and-the-built-environment-23141">important contributors</a> to our health and wealth. Urban planners strive to create the best environments, but many describe the results <a href="https://www.ncbi.nlm.nih.gov/pubmed/17152319">as “obesogenic”</a> – that is, places where fast-food outlets abound and there are few opportunities to be sufficiently physically active. </p>
<p>Increasing access to health data, along with the powerful analytical tools needed to interpret these data, provides an opportunity to develop a real fix for this worsening situation.</p>
<h2>How far planners have come</h2>
<p>Urban planners have come a long way in supporting healthy and active living. Internationally, this goes back to the late 1940s, when the World Health Organisation (WHO) <a href="http://www.who.int/topics/urban_health/en/">defined health</a> as much more than the absence of disease.</p>
<p>The subsequent and ongoing development of the <a href="http://www.euro.who.int/en/health-topics/environment-and-health/urban-health/activities/healthy-cities">WHO Healthy Cities movement</a>, the declaration of the <a href="http://www.who.int/healthpromotion/conferences/previous/ottawa/en/">Ottawa Charter</a>, and the publication of the <a href="https://theconversation.com/what-are-social-determinants-of-health-10864">social determinants of health</a> and the related <a href="http://eprints.uwe.ac.uk/16409/2/BE%20-%20Health%20map%20article%20April%2005.pdf">settlement map</a>, further reinforced the importance of urban planning and design in creating places that support health and wellbeing. </p>
<p>Recently, the UN <a href="http://www.un.org/sustainabledevelopment/sustainable-development-goals/">Sustainable Development Goals</a> cemented this focus on healthy built environments.</p>
<p>In Australia, this global recognition has brought built environment and health professionals into a closer working relationship. For example:</p>
<ul>
<li><p>The Victorian government <a href="https://www2.health.vic.gov.au/public-health/population-health-systems/municipal-public-health-and-wellbeing-planning">requires local councils</a> to prepare municipal health plans. </p></li>
<li><p>The New South Wales government policy acknowledges the built environment as a <a href="http://www.health.nsw.gov.au/heal/Pages/nsw-healthy-eating-strategy.aspx">key determinant of health</a>. </p></li>
<li><p>In Western Australia, the longitudinal <a href="http://www.see.uwa.edu.au/research/cbeh/projects/reside">RESIDE project</a> traces the health impacts on residents of the <a href="https://www.planning.wa.gov.au/Liveable-neighbourhoods.aspx">Liveable Neighbourhoods Community Design Guidelines</a>. </p></li>
<li><p>In Queensland, a <a href="http://statements.qld.gov.au/Statement/2017/5/21/new-commission-to-help-children-and-families-avoid-obesity-and-chronic-disease">Healthy Futures Commission</a> has been legislated to tackle chronic disease by adopting healthy lifestyles. </p></li>
<li><p>The National Heart Foundation has made an <a href="https://www.heartfoundation.org.au/for-professionals/built-environment">impressive set of contributions</a> to healthy built environments.</p></li>
<li><p>The Planning Institute of Australia has <a href="http://www.healthyplaces.org.au/site/">spearheaded interdisciplinary collaborations</a>. Most recently, it <a href="https://www.planning.org.au/policy/planning-for-healthy-communities">issued a statement</a> supporting healthy built environments.</p></li>
</ul>
<h2>How health data can help</h2>
<p>Despite this progress, some health indicators continue to deteriorate. The numbers of children either overweight or obese is a <a href="http://www.mdpi.com/1660-4601/14/4/369/htm">global public heath epidemic</a>. In Australia, <a href="https://www.nhmrc.gov.au/health-topics/obesity-and-overweight">around 25%</a> of children are overweight or obese. </p>
<p>This trend is worrying, as obesity in childhood and adulthood is strongly linked. </p>
<p>Being overweight or obese is a significant risk factor for developing type 2 diabetes. Globally, more than 420 million people have type 2 diabetes, and these numbers have quadrupled since 1980. In Western Sydney alone, 60% of adults are <a href="https://www.westernsydneydiabetes.com.au/themes/default/basemedia/content/files/WSLHD_Diabetes_Hotspot.pdf">overweight or obese</a>. </p>
<p>But increasing access to health data and more readily available analytical tools offer new opportunities to tackle ever-growing rates of obesity.</p>
<p>The Heart Foundation has published <a href="https://www.heartfoundation.org.au/for-professionals/built-environment">comprehensive design guidelines</a> and <a href="http://www.healthyactivebydesign.com.au/">a website</a> linking research evidence to good practice.</p>
<p>In NSW, the <a href="http://www.health.nsw.gov.au/urbanhealth/Pages/healthy-urban-dev-check.aspx">Healthy Urban Development Checklist</a> assists health professionals to comment on the extent to which urban planning proposals will support health. In Victoria, <a href="http://www.communityindicators.net.au/">Community Indicators</a> link practitioners with communities to create healthy places.</p>
<p>Other tools include <a href="https://cityfutures.be.unsw.edu.au/research/city-wellbeing/city-wellbeing-resources/healthy-built-environment-indicators/">Healthy Built Environment Indicators</a> and the NSW <a href="https://www.nswpcalipr.com.au/">Integrated Planning and Reporting Framework</a> to get physical activity and healthy eating into local council community strategic plans. </p>
<p>Beyond these approaches, <a href="https://aurin.org.au/projects/lens-sub-projects/urban-health-geovisualisation-eresearch-tools/">geographical information systems</a> (GIS) and other analytical tools can help <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755945/">tackle obesity</a>. South Australia’s Department for Health and Ageing <a href="https://esriaustralia.com.au/news/fight-against-child-obesity-gets-atechnological-boost-nar-146">used GIS</a> to plot gaps in built environment facilities and resources that impact childhood obesity. One study <a href="https://theconversation.com/resettled-refugees-adopt-australias-bad-food-habits-23928">established a link</a> between obesity and access to fast-food outlets. </p>
<p>Another initiative provided insights into the spatial patterning of health issues by developing a <a href="http://www.diabetesmap.com.au/">diabetes map</a>.</p>
<p>Using these tools is a step in the right direction. But challenges remain, particularly in terms of access to health data. The release of datasets such as the <a href="https://aurin.org.au/blog/2016/06/20/are-we-planning-adequate-healthcare-for-the-coming-ageing-boom/">National Health Services Directory</a> and the <a href="https://aurin.org.au/blog/2017/03/17/new-data-alert-mortality-health-crime-and-schools/">National Deaths and Mortality database</a> is encouraging. </p>
<h2>Greater ease of use improves implementation</h2>
<p>Increasing user-friendliness of analytical tools, such as GIS and online portals like the <a href="https://link.springer.com/chapter/10.1007%2F978-3-319-18368-8_13">AURIN workbench</a>, and <a href="https://link.springer.com/chapter/10.1007%2F978-3-319-57819-4_9">walkability planning support systems</a> offer more powerful means of understanding the relationship between health and the built environment. </p>
<p>However, to seize these opportunities, we need enhanced data analysis, interpretation and presentation skills for planners and policymakers. </p>
<p>It takes skill to communicate the stories in the data and clearly identify the implications and required policy responses. Practical and policy-relevant research <a href="http://blogs.unsw.edu.au/cityfutures/blog/2017/05/planning-healthy-cities-a-workshop-linking-research-questions-to-policy-and-practice/">is critical</a>.</p>
<p>Enshrining the need for planning healthy built environments in legislation will help planners in their fundamental role of promoting healthy lifestyles. Planners can be taught the theory. But putting it into practice requires a strong policy framework to support principles, maintain standards and withstand cost-cutting pressures.</p>
<p>With the increasing democratisation of health data and better access to analytical tools such as Australian National Data Services, AURIN and others, spatial thinking, data-driven approaches and collaborative action can fast-track plans for new and renewed environments that enable healthy living.</p><img src="https://counter.theconversation.com/content/80604/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Alison Taylor receives funding from the Australian National Data Service (ANDS). </span></em></p><p class="fine-print"><em><span>Christopher Pettit receives funding from the: Australian Research Council (ARC), National Health Medical Research Council (NHMRC), Australian National Data Services (ANDS), Collaborative Research Centre - Spatial Information, Collaborative Research Centre - Low Carbon Living, and various State and Local Government Authorities.</span></em></p><p class="fine-print"><em><span>Ori Gudes receives funding from the Collaborative Research Centre - Spatial Information, and the Collaborative Research Centre - Low Carbon Living. </span></em></p><p class="fine-print"><em><span>Susan Thompson does not work for, consult, own shares in or receive funding from any company or organisation that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Enshrining the need for planning healthy built environments in legislation will help ensure the fundamental role planners have to play in facilitating healthy lifestyles.Alison Taylor, Lecturer, Faculty of Built Environment, UNSW SydneyChristopher Pettit, Professor of Urban Science, UNSW SydneyOri Gudes, Research Fellow, Cities Futures Research Centre, UNSW SydneySusan Thompson, Professor of Planning and Head, City Wellbeing Program, City Futures Research Centre, UNSW SydneyLicensed as Creative Commons – attribution, no derivatives.tag:theconversation.com,2011:article/757422017-05-02T02:35:30Z2017-05-02T02:35:30ZHow to boil down a pile of diverse research papers into one cohesive picture<figure><img src="https://images.theconversation.com/files/167402/original/file-20170501-17304-nalnmm.jpg?ixlib=rb-1.1.0&rect=0%2C58%2C2114%2C1411&q=45&auto=format&w=496&fit=clip" /><figcaption><span class="caption">Can an algorithmic method for analyzing published research help zero in on reality?</span> <span class="attribution"><a class="source" href="https://www.shutterstock.com/image-photo/shelves-old-scientific-journals-202908463">Sergei25/Shutterstock.com</a></span></figcaption></figure><p>From social to natural and applied sciences, overall scientific output has been growing worldwide – it <a href="http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html">doubles every nine years</a>.</p>
<p>Traditionally, researchers solve a problem by conducting new experiments. With the ever-growing body of scientific literature, though, it is becoming more common to make a discovery based on the vast number of already-published journal articles. Researchers synthesize the findings from previous studies to develop a more complete understanding of a phenomenon. Making sense of this explosion of studies is critical for scientists not only to build on previous work but also to push research fields forward.</p>
<p>My colleagues <a href="http://mitsloan.mit.edu/faculty-and-research/faculty-directory/detail/?id=3547">Hazhir Rahmandad</a> and <a href="https://pwp.gatech.edu/kamran-paynabar/">Kamran Paynabar</a> and I have developed a new, more robust way to pull together all the prior research on a particular topic. In a five-year joint <a href="http://jalali.mit.edu/gma">project</a> between MIT and Georgia Tech, we worked to create a new technique for research aggregation. Our recently published paper in PLOS ONE introduces a flexible method that <a href="http://dx.doi.org/10.1371/journal.pone.0175111">helps synthesize findings from prior studies</a>, even potentially those with diverse methods and diverging results. We call it <a href="https://en.wikipedia.org/wiki/Generalized_model_aggregation">generalized model aggregation</a>, or GMA.</p>
<h2>Pulling it all together</h2>
<p><a href="http://researchguides.ebling.library.wisc.edu/c.php?g=293229&p=1953452">Narrative reviews</a> of the literature have long been a key component of scientific publications. The need for more comprehensive approaches has led to the emergence of two other very useful methods: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3024725/">systematic review and meta-analysis</a>. </p>
<p>In a systematic review, an author finds and critiques all prior studies around a similar research question. The idea is to bring a reader up to speed on the current state of affairs around a particular research topic.</p>
<p>In a meta-analysis, researchers go one step further and synthesize the findings quantitatively. Essentially, it takes a weighted average of the findings of several studies on one topic. Pooling results from multiple studies is meant to generate a more reliable finding than that of any single study. This is crucially helpful when prior studies reported diverging findings and conclusions. And the rise in the publications of meta-analysis has shot up over the last decade, underscoring their importance across research communities.</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=354&fit=crop&dpr=1 600w, https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=354&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=354&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=444&fit=crop&dpr=1 754w, https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=444&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/163958/original/image-20170404-5725-g8zkku.jpg?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=444&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">Publications of meta-analyses are on the rise, based on Web of Science search results for articles that included the term ‘meta-analysis’ in their title.</span>
<span class="attribution"><span class="source">Mohammad S. Jalali</span>, <a class="license" href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></span>
</figcaption>
</figure>
<p>Meta-analysis has been helpful in increasing our understanding of many scientific problems. But it has some challenges. <a href="https://us.sagepub.com/en-us/nam/methods-of-meta-analysis/book240589">A typical meta-analysis</a> combines just one explanatory variable (that is, a treatment controlled by the experimenter) and one response variable (for instance, a health outcome). Also, a researcher has to be very careful not to lump apples and oranges together in the meta-analysis. She must be selective and make sure to include only previous work that shared a very similar study design.</p>
<p>Here is where our simple and flexible generalized model aggregation method comes in. Using GMA, the prior studies do not necessarily need to have the same study design or method. They can also have different explanatory variables. As long as they are all answering a similar research question, GMA can synthesize them.</p>
<h2>Pooling findings from across a field</h2>
<p>Consider an example from the health literature. Obesity and nutrition researchers need reliable equations that estimate basal metabolic rate (BMR) – the amount of energy the human body spends at complete rest. Understanding BMR has big implications for real-world questions of weight management.</p>
<p>Researchers often estimate BMR as a function of different attributes: age, height, weight, fat mass and fat-free mass. The challenge is that current publications in research journals <a href="https://doi.org/10.1038/ijo.2012.218">provide over 200 such equations</a> estimated for different samples and age groups. These equations also include different subsets of those attributes.</p>
<p>For example, one of these equations included weight and age, but another included only fat-free mass. Another equation considered the impact of all these attributes, but the sample size was too small to make it reliable. More interestingly, and confusingly, there have been several studies with similar samples and variables but they have reported very different equations to explain the relationships.</p>
<p>So which equations are you going to choose to accurately estimate BMR? How do you ensure that your selected equation is more reliable than the rest? </p>
<p>In order to address these questions, <a href="http://journals.plos.org/plosone/article/file?type=supplementary&id=info:doi/10.1371/journal.pone.0175111.s001">we identified 27 published BMR equations</a> for white males from published studies. Then we used GMA to aggregate them into a single equation, which we called a meta-model.</p>
<p>Through validation tests, we showed that our meta-model is more precise than any of the prior equations for estimating BMR. It also can deal with a logarithmic relationship between two variables – something not captured by any of the original 27 linear equations.</p>
<p>We tested our method by putting it up against more complex situations. What if all the equations we aggregate using GMA are actually off the mark? Would GMA still get close to what is really going on?</p>
<figure class="align-center zoomable">
<a href="https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=1000&fit=clip"><img alt="" src="https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&fit=clip" srcset="https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=600&h=324&fit=crop&dpr=1 600w, https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=600&h=324&fit=crop&dpr=2 1200w, https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=600&h=324&fit=crop&dpr=3 1800w, https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=45&auto=format&w=754&h=407&fit=crop&dpr=1 754w, https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=30&auto=format&w=754&h=407&fit=crop&dpr=2 1508w, https://images.theconversation.com/files/164555/original/image-20170408-29386-lwhnmr.PNG?ixlib=rb-1.1.0&q=15&auto=format&w=754&h=407&fit=crop&dpr=3 2262w" sizes="(min-width: 1466px) 754px, (max-width: 599px) 100vw, (min-width: 600px) 600px, 237px"></a>
<figcaption>
<span class="caption">The meta-model (on the right) relies only on reported information from the two incorrect models in the middle – not their observed data or the true data. And it is much closer to reality (on the left) than either incorrect model.</span>
<span class="attribution"><a class="source" href="https://doi.org/10.1371/journal.pone.0175111">Rahmandad et al, DOI: 10.1371/journal.pone.0175111</a>, <a class="license" href="http://creativecommons.org/licenses/by/4.0/">CC BY</a></span>
</figcaption>
</figure>
<p>To investigate, we imagined two researchers coming up with two different linear equations to describe what they did not realize is actually a nonlinear phenomenon. The findings of the two researchers are far from reality. But again, our meta-model provided an extremely close estimate of reality – even when aggregating these two incorrect and biased models.</p>
<h2>How GMA gets at the truth</h2>
<p>So how does it all work? There is no magic here. In fact, the <a href="https://en.wikipedia.org/wiki/Generalized_model_aggregation">intuition behind GMA is simple</a>, which lets researchers with no extensive statistical background use it. </p>
<p>Broadly, each previous empirical study is an attempt to estimate an underlying reality. Let’s call this the “true model.” And it is unknown to us; whatever is actually driving the phenomenon under investigation is nature’s secret. The empirical studies report relevant information about the true model, even if they are biased or incomplete. </p>
<p>Generalized model aggregation uses computer simulations to replicate prior studies. This time, though, the simulated studies attempt to estimate a meta-model instead of the true model (that is, reality). </p>
<p>We feed the empirical studies’ reported estimates into the simulation. The flexibility of the GMA allows us to also use any other additional information about the underlying true model, too – such as the relationships among the variables or the quality of empirical studies’ estimates. This extra information helps increase the reliability of GMA estimates.</p>
<p>The GMA algorithm carefully applies the same sample characteristics to each previous study and replicates their same method. Then it compares the outcomes of the simulated studies with the actual results of the empirical studies, trying to find the closest match. Through this matching process, GMA estimates the meta-model.</p>
<p>If the simulated and actual outputs match, the meta-model may be a good representation of the true model – that is, by running a bunch of studies through the GMA algorithm, we are able to tease out a closer approximation of how the phenomenon in question actually works. </p>
<h2>Wide range of applications for GMA</h2>
<p>In our paper, we <a href="http://dx.doi.org/10.1371/journal.pone.0175111">discussed a wide range of examples</a>, from health to climate change and environmental sciences, that can benefit from generalized model aggregation. Using GMA to synthesize prior findings into a coherent meta-model can increase the accuracy of aggregation. </p>
<p>In the current replicability crisis, GMA can help not only identify studies that are reproducible, but also distinguish reliable findings from less robust ones. </p>
<p>We reported <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0175111#pone.0175111.s001">all the steps of our analysis</a> for further replication. A recipe for using GMA and its codes, along with instructions, is also <a href="http://jalali.mit.edu/gma">publicly available</a>.</p>
<p>We hope that GMA can extend the reach of current research synthesis efforts to many new problems. GMA can help us understand the bigger picture of phenomena by aggregating their parts. Consider a puzzle with its pieces scattered about; the overall picture is revealed only when the pieces have been put together.</p><img src="https://counter.theconversation.com/content/75742/count.gif" alt="The Conversation" width="1" height="1" />
<p class="fine-print"><em><span>Mohammad S. Jalali does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.</span></em></p>Researchers need to be able to draw conclusions based on previously published studies in their field. A new aggregation method synthesizes prior findings and may help reveal more of the big picture.Mohammad S. Jalali, Research Faculty, MIT Sloan School of ManagementLicensed as Creative Commons – attribution, no derivatives.