Just how unpredictable is the Premier League? Scientists have done the maths

Leicester City’s win last year was unpredictable, which is why so many of us found it exciting. EPA/Nigel Roddis

The new Premier League season has just begun and teams will be vying to lift the trophy. But who will succeed and how easy is it to predict the winner?

The 2015-16 season saw some incredibly unlikely events. The bookmakers were offering 5,000-1 on Leicester City winning the league, which they did against all odds. According to algorithms in SAM, our Sports Analytics Machine at the University of Salford, the probability of Chelsea finishing ninth or lower in that final league table was around 0.2% or 500-1 (they finished tenth), while the probability of West Ham United beating Manchester City, Liverpool and Arsenal (which they did), was calculated at around 1,200-1 at the start of the season.

So just how unpredictable was the league season compared to previous ones? And how does this compare to the other major leagues around Europe? We took a closer look.

Unpredictability is one way of measuring competitive balance – which can roughly be defined as the extent to which any team can beat any other team. The reason this balance is interesting is because economists have found a relationship between competitive balance and the popularity of a sporting event – the so-called “uncertainty of outcome hypothesis”. Better competitive balance creates more uncertain outcomes and an increase in match attendances, television audiences and overall interest. So it may be that the global popularity of football compared to other sports is, in part, a consequence of it being “competitively balanced”.

Measuring competitive balance

Measuring competitive balance – and unpredictability – is a complicated issue. Most measures use reasonable metrics with properties that make them useful when trying to capture particular characteristics of a sports competition or league. For example, you could measure competitive balance in the Premier League by counting the number of teams that won the title during a certain time period. This number could be compared to the equivalent number calculated over the same time period for other football leagues in Europe. If the same team wins the league every year, then the league is unlikely to be “competitive”. Looking at the Premier League and Spain’s La Liga since 1992, six teams have won the former, while five teams have won the latter.

Alternatively, you could compute the spread of points in the league, and compare a league with itself across the last, say, ten seasons. If the spread in the final points total won by teams in the league is increasing, then it might be a sign that the gap between the top teams and the bottom teams in the league is growing, and that competitive balance in the league is decreasing.

Here we chose to use an algorithm to compute the competitive balance/unpredictability at the level of each match. This gives a metric that represents the difference between the observed result and the expected result. For example, the probability that a match will finish with a home win is compared with whether it actually finished with a home win. For those interested in the details, our unpredictability index for a match is given by (H-pH)2 + (D-pD)2 + (A-pA)2 where H stands for home win, D stands for draw and A stands for away win. H is 1 if the match result is a home win, or 0 otherwise and pH is the probability of a home win. Similarly, D=1 if the match is a draw, and A=1 if the match is an away win, with pD and pA being the probabilities of a draw and away win respectively.

In essence, if the expected result happens, then the unpredictability will be small; if a result is unexpected, then it will be large. An unpredictability score of 1 means a certain game was completely unpredictable, whereas a score of 0 means it was entirely predictable. To measure the unpredictability of all matches over a whole season, we sum up each of the match unpredictability indexes over all 380 matches in the Premier League season.

We obtained the probabilities for home wins from historical bookmaker odds data obtained from football-data.co.uk. We then rescaled the odds linearly so that the probabilities sum to one for each match. In some sense then, our unpredictability index is a measure of how good, or bad, the bookmakers have been at setting odds.

The figure below shows the unpredictability index plotted for each of the last 13 seasons (the heavy red lines) with 90% “confidence intervals” (thin red lines) – a range of values so defined that you can be 90% certain that the value of a parameter lies within it. The confidence interval is a way of showing where the true value of the unpredictability index might lie. The dashed lines are the monthly average unpredictability indexes.

Premier league unpredictability over past seasons. Author provided

The 2015-16 season was indeed the most unpredictable season in the last 13 seasons. But, once you calculate a confidence interval around our unpredictability index, it is not statistically different from the previous seasons. So the bookmakers had a harder time predicting the outcomes of matches than in recent seasons, but there doesn’t seem to have been a fundamental shift in the unpredictability of the Premier League.

The dashed lines on the above plot show the monthly average unpredictability index. Interestingly, there appears to be no pattern. It might have been that at the start of the season there would be more unpredictability as teams would be settling down and getting used to new players, or the loss of old players. Similarly, there might have been increased levels of unpredictability at the end of the season as some matches were effectively dead-rubbers (the outcome already decided by previous matches) so that the result was not important to either team. There is no evidence of either effect in the above plot.

Unpredictability around Europe

There are football leagues all around Europe, and given the relationship of unpredictability with popularity, it is interesting to see which leagues in Europe have been the most unpredictable this season.

Again taking bookmaker odds and results from the football data website, we calculated the unpredictability index for the top leagues in Germany, France, Italy, Netherlands, Scotland and Spain and in both the Premier League and the English Championship.

The monthly and seasonal average unpredictability index is shown in the plot below.

Unpredictability of different leagues. Author provided

It is perhaps unsurprising to football fans that of the top leagues around Europe, the Premier League has been the most unpredictable. But actually, the English Championship is even more unpredictable, closely followed by the Scottish Premier League. The least unpredictable league is Spain’s La Liga.

The least unpredictable league in Europe: Spain’s La Liga. Shutterstock

Indeed, given that the Spanish league has superstar players such as Lionel Messi, Cristiano Ronaldo and Luis Suarez, it might, in part, explain the contradiction that the league does not attract the money that the Premier League does.

I for one thoroughly enjoyed the first week of games in the new Premier League season and am looking forward to the rest of what will no doubt prove to be another exciting and unpredictable season.