Don’t blame it on algorithms: what they really are and how they can fuel progress in the life sciences

Image of a five-knot tori algorithmically. Tanaka Juuyoh/Flickr

This article was written by Mislav Acman, Aamir Abbasi, Eleonore Bellot, Charles Bernard, Thibault Corneloup, Christopher Coste, Ke Fang, Louis Halconruy, Raphaël Ponthieu, Ahmed Saadawi, Olga Seminck and Sebastian Sosa Carrillo as part of the yearly thematic workshop at the Centre for Research and Interdisciplinarity in Paris. The workshop is a five-day conference organised and given by second-year Ph.D. and masters students from a range of disciplines on a scientific topic of shared interest.

Algorithms are ubiquitous today, yet often misunderstood. The word algorithm is often used to refer an opaque computational system that can sometimes lead to questionable results. For instance, Donald Trump’s victory in the U.S. presidential election is partially attributed to two “algorithmic” failures: First, Facebook’s emerging as an influential news provider and its algorithm’s failure to filter out “fake news”. Second, the failure of most election forecast institutions to predict the result of the U.S. election even when using state-of-the-art algorithms performing aggregation, big-data analysis and forecasting. These failures are deemed so detrimental by some scholars that they advocate that algorithms to be regulated by the U.S. Food and Drug Administration.

However, algorithms should not take all the blame. These failures revolve around what is only a subfamily: big-data analytics algorithms. In addition, the failures in these algorithms often arise not from an error in the code or a misconception in the process, but from the poor statistical treatment of the data.

Data in, results out

An algorithm is nothing more than a method for calculating a function via a sequence of actions. Like a cooking recipe, which has the end function of producing a meal by through the systematic preparation of ingredients. A perfectly good recipe will completely fail if the grocer (the data provider) delivers oranges instead of tomatoes, because he or she thinks this does not make a big difference. Therefore, the question is to what extent data can be blamed for the recent “algorithmic” failures.

When an algorithm’s end result is evaluated, the data that was used should be critically evaluated. The only aspect of the algorithm that can be evaluated objectively without considering the data is its speed. Speed of an algorithm depends on the computational complexity theory, dealing with the minimum time it takes to tackle a particular problem for a certain amount of input data. One example of computational complexity is the traveling salesman problem. This problem deals with the calculation of the shortest route to visit each of the n number of cities once, and return to the starting city. To calculate the shortest route the algorithm needs to try the permutations of all possible trajectories, implying a calculation time of n factorial. The speed of an algorithm can be optimised but nothing can be done if the given or collected data is biased, incomplete or unsorted.

However, algorithms are not all about failures. Algorithms are widely used in the life sciences. The advances in computer science, algorithmic theory and big data allow scientists to obtain results that were unthinkable in the 19th century.

The Fibonacci spiral. Jahobr/Wikimedia, CC BY

In demographics, algorithms have long been at the core of the analysis of populations. The first ones were simple, but sowed the idea that population growth could be modelled by time-step projection. Fibonacci, in Liber Abaci, developed an algorithm that projected over time a population of rabbits with two age classes: young and old, where at each time-step, young become old, and each old rabbit gives birth to one young rabbit. The result is the famous Fibonacci series. These algorithms were refined by other mathematicians such as Euler and put in matrix form in the 1940s by Leslie, paving the way to the elaboration of much more complex algorithms, integrating many characteristics of the studied population and allowing the environment to change. Algorithms are also used to generate accurate rates of survival, fertility, and migration from observed data. They can also allow us to study the current structure of a population, especially human population, with the advent of big data.

From biology to brain functions

In biology, algorithms are useful to understand biological structures such as protein complexes. Algorithms are also used to analyse a set of images and quantify spatiotemporal correlations between objects in the images. All microscopy and macroscopy techniques in biology involves acquiring images. Here, image-analysis algorithms play a major role in studying the functional organisation of life.

In neuroscience, algorithms play a vital role in studying the brain. Specially, machine learning algorithms that help in decoding and classifying brain activity. This allows scientists to study a group of neurons at the level of a system and test different hypotheses. Machine-learning algorithms are also instrumental to neuroprosthetics research, a subdomain in neuroscience, which involves connecting brain with a prosthetic limb. The algorithms facilitate the control of the prosthetic limb by decoding the brain activity of the user. This research is aimed in restoring movement autonomy in people with motor disabilities or impairment.

Algorithms have thus enabled progress in a wide range of scientific and technological fields. Their failure in predicting the outcome of the U.S. presidential elections was due to poor statistical treatment of the data, not the algorithms themselves.