Bayes’ Theorem: the maths tool we probably use every day, but what is it?

Our world view and resultant actions are often driven by a simple theorem, devised in secret more than 150 years ago by a quiet English mathematician and theologian, Thomas Bayes, and only published after his death.

Bayes’ Theorem was famously used to crack the Nazi Enigma code during World War II, and now manages uncertainty across science, technology, medicine and much more.

So how does it work?

Bayes’ Theorem explained

Thomas Bayes’ insight was remarkably simple. The probability of a hypothesis being true depends on two criteria:

how sensible it is, based on current knowledge (the “prior”)
how well it fits new evidence.

Yet, for 100 years after his death, scientists typically evaluated their hypotheses against only the new evidence. This is the traditional hypothesis-testing (or frequentist) approach that most of us are taught in science class.

The difference between the Bayesian and frequentist approaches is starkest when an implausible explanation perfectly fits a piece of new evidence.

Let me concoct the hypothesis: “The Moon is made of cheese.”

An implausible hypothesis. Michael Lee (Flinders University and South Australian Museum)

I look skywards and collect relevant new evidence, noting that the Moon is cheesy yellow in colour. In a traditional hypothesis-testing framework, I would conclude that the new evidence is consistent with my radical hypothesis, thus increasing my confidence in it.

Traditional hypothesis-testing methods (frequentist approaches) only consider how well a hypothesis fits new evidence. Michael Lee (Flinders University and South Australian Museum)

But using Bayes’ Theorem, I’d be more circumspect. While my hypothesis fits the new evidence, the idea was ludicrous to begin with, violating everything we know about cosmology and mineralogy.

Thus, the overall probability of the Moon being cheese - which is a product of both terms - remains very low.

Bayesian Inference considers how well the hypothesis fits existing knowledge, and how well it fits new evidence. For simplicity, the Normalising Constant has been omitted from the formula. Michael Lee (Flinders University and South Australian Museum)

Admittedly, this is an extreme caricature. No respectable scientist would ever bother testing such a dumb hypothesis.

But scientists globally are always evaluating a huge number of hypotheses, and some of these are going to be rather far-fetched.

For example, a 2010 study initially suggested that people with moderate political views have eyes that can literally see more shades of grey.

This was later dismissed after further testing, conducted because the researchers recognised it was implausible to begin with. But it’s almost certain that other similar studies have been accepted uncritically.

The Bayesian approach in life

We use prior knowledge from our experiences and memories, and new evidence from our senses, to assign probabilities to everyday things and manage our lives.

Consider something as simple as answering your work mobile phone, which you usually keep on your office desk when at work, or on the charger when at home.

You are at home gardening and hear it ringing inside the house. Your new data is consistent with it being anywhere indoors, yet you go straight to the charger.

You have combined your prior knowledge of the phone (usually either on the office desk, or on the charger at home) with the new evidence (somewhere in the house) to pinpoint its location.

If the phone is not at the charger, then you use your prior knowledge of where you have sometimes previously left the phone to narrow down your search.

You ignore most places in the house (the fridge, the sock drawer) as highly unlikely a priori, and hone in on what you consider the most likely places until you eventually find the phone. You are using Bayes’ Theorem to find the phone.

Belief and evidence

A feature of Bayesian inference is that prior belief is most important when data are weak. We use this principle intuitively.

For example, if you are playing darts in a pub and a nearby stranger says that he or she is a professional darts player, you might initially assume the person is joking.

You know almost nothing about the person, but the chances of meeting a real professional darts player are small. DartPlayers Australia tells The Conversation there are only about 15 in Australia.

If the stranger throws a dart and hits the bullseye, it still mightn’t sway you. It could just be a lucky shot.

But if that person hits the bullseye ten times in a row, you would tend to accept their claim of being a professional. Your prior belief becomes overridden as evidence accumulates. Bayes’ Theorem at work again.

The one theory to rule them all

Bayesian reasoning now underpins vast areas of human enquiry, from cancer screening to global warming, genetics, monetary policy and artificial intelligence.

Risk assessment and insurance are areas where Bayesian reasoning is fundamental. Every time a cyclone or flood hits a region, insurance premiums skyrocket. Why?

Houses are surrounded by floodwaters at Depot Hill, in Rockhampton, after ex-cyclone Debbie dumped heavy rain on Queensland this year. AAP Image/Dan Peled

Risk can be tremendously complex to quantify and current conditions might provide scant information about likely future disasters. Insurers therefore estimate risk based on both current conditions and what’s happened before.

Every time a natural disaster strikes, they update their prior information on that region into something less favourable. They foresee a greater probability of future claims, and so raise premiums.

Bayesian inference similarly plays an important role in medical diagnosis. A symptom (the new evidence) can be a consequence of various possible diseases (the hypotheses). But different diseases have different prior probabilities for different people.

A major problem with online medical tools such as webMD is that prior probabilities are not properly taken into account. They know very little about your personal history. A huge range of possible ailments can be thrown up.

A visit to a doctor who knows your prior medical records will result in a narrower and more sensible diagnosis. Bayes’ Theorem once again.

Alan Turing and Enigma

Bayesian approaches allow us to extract precise information from vague data, to find narrow solutions from a huge universe of possibilities.

They were central to how British mathematician Alan Turing cracked the German Engima code. This hastened the allied victory in World War II by at least two years and thus saved millions of lives.

To decipher a set of encrypted German messages, searching the near-infinite number of potential translations was impossible, especially as the code changed daily via different rotor settings on the tortuously complex Enigma encryption machine.

Turing’s crucial Bayesian insight was that certain messages were much more likely than other messages.

Cracking the Enigma code.

These likely solutions, or “cribs” as his team called them, were based on previous decrypted messages, as well as logical expectations.

For example, messages from U-boats were likely to contain phrases related to weather or allied shipping.

The strong prior information provided by these cribs greatly narrowed the number of possible translations that needed to be evaluated, allowing Turing’s codebreaking machine to decipher the Enigma code rapidly enough to outpace the daily changes.

A rebuilt replica of a ‘bombe’ machine used by cryptologists to crack the German enigma code. Ted Coles/Wikimedia

Bayes and evolution

Why are we so interested in Bayesian methodology? In our own field of study, evolutionary biology, as in much of science, Bayesian methods are becoming increasingly central.

From predicting the effects of climate change to understanding the spread of infectious diseases, biologists are typically searching for a few plausible solutions from a vast array of possibilities.

In our research, which mainly involves reconstructing the history and evolution of life, these approaches can help us find the single correct evolutionary tree from literally billions of possible branching patterns.

In work - as in everyday life - Bayesian methods can help us to find small needles in huge haystacks.

The dark side of Bayesian inference

Of course, problems can arise in Bayesian inference when priors are incorrectly applied.

In law courts, this can lead to serious miscarriages of justice (see the prosecutor’s fallacy).

In a famous example from the UK, Sally Clark was wrongly convicted in 1999 of murdering her two children.

Prosecutors had argued that the probability of two babies dying of natural causes (the prior probability that she is innocent of both charges) was so low – one in 73 million – that she must have murdered them.

But they failed to take into account that the probability of a mother killing both of her children (the prior probability that she is guilty of both charges) was also incredibly low. So the relative prior probabilities that she was totally innocent or a double murderer were more similar than initially argued.

Professor Philip Dawid on the Sally Clark case.

Clark was later cleared on appeal with the appeal court judges criticising the use of the statistic in the original trial.

This highlights how poor understanding of Bayes’ Theorem can have far-reaching consequences. But the flip side is that Bayesian methods with well justified, appropriate priors can provide insights that are otherwise unobtainable.

Bayes’ Theorem: the maths tool we probably use every day, but what is it?

Authors

Disclosure statement

Partners