Events where groups of people come together to create or improve software using large data sets are usually called hackathons. As health data researchers who want to build and maintain public trust, we recommend the use of alternative terms, such as datathon and code fest.
Hackathon is a portmanteau that combines the words “hack” and “marathon.” The “hack” in hackathon is meant to refer to a clever and improvised way of doing something rather than unauthorized computer or data access. From a computer scientist’s perspective, “hackathon” probably sounds innovative, intensive and maybe a little disruptive, but in a helpful rather than criminal way.
The issue is that members of the public do not interpret “hack” the way that computer scientists do.
Our team, and many others, have performed research studies to understand the public’s interests and concerns when health data are used for research and innovation. In all of these studies, we are not aware of any positive references to “hack” or related terms. But studies from Canada, the United Kingdom and Australia have all found that members of the public consistently raise hacking as a major concern for health data.
Fear of hacking
It is not hard to figure out where negative associations with the word “hack” come from. There is a regular stream of news headlines, like: “As Hackers Take Down Newfoundland’s Health-Care System, Silence Descends”; “T-Mobile Says Hackers Accessed Personal Data of an Additional 5.3 Million Customers”; and “They Told Their Therapists Everything. Hackers Leaked It All.”
Taking the research studies and news headlines together, there are strong reasons to think that the term hackathon will be perceived as negative to members of the public. Based on the common use and understanding of hacking, the term hackathon could even be perceived as threatening if it is misinterpreted as referring to an event where computer scientists do unauthorized things with data.
Language is important when talking about health data — it helps to create transparency and build trust around managing people’s information and privacy. As such, words must be chosen carefully, and should be guided by the preferences and concerns of the people whose data are being used for research and innovation.
Alternatives to hackathon
There are alternatives to the term hackathon, but they are used much less frequently. For example, a Google search conducted in July 2022 returned 58.7 million results for “hackathon” compared to 617,000 results for “datathon” and 54,700 results for “code fest.” There were more than 90 references to “hackathon” for every “datathon” reference that the Google search identified.
In the research literature there is a slightly higher frequency of alternative terms, but hackathon still dominates. For example, a July 2022 Google Scholar search identified 30 times more scholarly “hackathon” publications than there were “datathon” publications.
Widespread use of the term hackathon may be reinforced by software libraries and dictionaries that perpetuate outdated and harmful terminology. For example, in the current version of Microsoft Word, “hackathon” is a recognized word but “datathon” is flagged as a spelling mistake.
We are not saying that hackathons are bad, just that the label most commonly used for them is problematic. And it’s not as though we lack alternatives to the term hackathon. Another way of looking at the Google search results is that the term datathon has been used hundreds of thousands of times, including by well-known organizations such as the EU Datathon.
Given public concerns about hacking and data, we recommend that datathon and other alternatives to hackathon be used more often. Words matter and using language like datathon can help organizations that hold or provide access to data show that they are attentive to the concerns of the people and communities that the data is about.