Menu Close
France’s goalkeeper #01 Hugo Lloris (C) jumps for the ball during the Qatar 2022 World Cup quarter-final football match between England and France at the Al-Bayt Stadium in Al Khor, north of Doha, on December 10, 2022. Jewel Samad/AFP

World Cup 2022: crunching 150 years of big data to predict the winner

Perhaps more than any recent World Cup, this year’s competition in Qatar has thrown up considerable surprises. Who indeed, of the analytics crowd, could have predicted Saudi Arabia would defeat Argentina the way it did, or competition-favourite Brazil would end up losing to Croatia? Meanwhile, Morocco has stunned commentators by becoming the first African nation to reach semi-finals. Now, almost a week away from the final, speculation is about who will win the game is at its apex. Is there any way we could predict the results better than we did by “following the science”?

150 years of big data

In collaboration with analytics company Alteryx and Stirling University , our team at Audencia has given its best shot by developing a sophisticated World Cup prediction model drawing from 150 years of international football match results, including tournaments and friendly games.

While developing our initial mathematical model, we considered factors such as win ratio, goals scored, and overall match results. To further improve the accuracy of our prediction, we took into account individual teams’ current FIFA ranking and overall rating. We also added FIFA player ratings along with individual player skills and attribute scores (i.e., attack, movement, power, defence). This allows fine adjustments in our modelling technique based on individual player selection and injuries during later stages.

With the existing data as a machine-learning training mechanism, we employed Random Forest algorithm to predict results for every World Cup fixture. Using the Alteryx data analysis platform, we calculated the overall outcome of individual games along with expected goals (xG) per team per match. Overall, our model showed 60% to 70% accuracy rate in the course of the training and testing phase. In data science language, this is considered to be a range of accuracy that is acceptable to good in predicting the outcome of an event.

Importance and ranking of key data variables used for match prediction. Author provided

Group stage, knockout stage and the winner

Applying the model to the FIFA World Cup 2022 group stage fixtures produced some interesting and unexpected results. We ran the simulation through 500 different sets of probabilities to verify the accuracy of these predictions. Our algorithm successfully predicted the qualifications of 11 teams, including Senegal and Morocco, reflecting 68.7% accuracy.

Group stage result prediction

Author provided

After the start of the knockout stage, we reset the results and ran the simulation mimicking the original knockout fixtures. The new analysis also took into account player performance during this World Cup and likely player selection during each match. During the round of 16 predictions, our algorithm predicted seven correct match result outcomes, reflecting 87.5% accuracy. The only shock result was Morocco’s win over Spain, which we couldn’t capture appropriately.

Round of 16 results (predicted vs actual)

Developed using The Telegraph Wallchart, Author provided

In the simulation of the quarter finals, once again, we revised the algorithm considering quarterfinal fixtures and individual player’s performance during the World Cup. This time our algorithm only came out with 50% accuracy, failing to predict Brazil’s exit and Morocco’s triumph over Portugal. The tournament and fan’s favourite, Brazil’s loss to Croatia was a result of their failure to create early scoring chances. Morocco and Croatia’s persistent resistance leading to elimination of bigger team show that this World Cup has favoured teams who were well organised with their defence.

Another important aspect of this world cup is penalty shootout success. Big teams like Spain and Brazil fell due to poor penalty performance.

Quarter-final results (predicted vs actual)

Developed using The Telegraph Wallchart, Author provided

Similarly, England fell short as Harry Kane lost a second decisive penalty kick against France. Looking at these emerging statistics, it seems teams that can keep their goalkeepers in top forms and have a good penalty squad are likely to win the World Cup.

Despite having the best shot saving goalkeeper, Argentina is likely to be at disadvantage in the semi-finals due to two key player suspensions, Marcos Acuna and Gonzalo Montiel. The suspension of Walid Cheddira will also put Morocco at a disadvantage against France.

Semi-final results (predicted)

Developed using The Telegraph Wallchart, Author provided

Both France and Argentina have boasted tough opposition with excellent defence and goal conceding records in this World Cup. If the semi-final games end in a penalty shootout then Croatia and Morocco will have a greater chance to reach the final.

Author provided

Based on current records, our big data driven prediction indicates that the final of the FIFA World Cup 2022 is set to be played between two-time winner France and their opponent, Argentina, on 18 December at Lusail Stadium, Qatar.

France is predicted to be the first defending champions of the modern era since Brazil defended theirs in 1962, bringing joy to the country’s 67 million residents and its diaspora around the world. If these two teams make it to the final, then France is likely to be the favourite given their squad’s make-up and better defensive history in the course of the knockout stages.

Final (predicted)

Developed using The Telegraph Wallchart, Author provided

If the final games goes to the penalty stage, Argentina will hold the advantage given its recent records and its goalkeeper. While a superb attacker, France’s Kylian Mbappé has only a 75% penalty conversion success rate.

Historic penalty shoot-out success from semi-final onwards. Author provided

Croatia is predicted to be third in the competition.

Prediction validity and big surprises

It is impossible to achieve 100% accuracy in predicting game outcomes, particularly in a tournament that is played at the highest level. Additional factors such as venue, host country weather, timing of the tournament, referee judgement, video assistant referee (VAR) interventions, squad formation, in-game tactical switches, and player concentration and stamina all play a huge role in producing the final outcome. These elements are relatively new to sports science and we are still unsure about how to apply them as influential statistical factors in an algorithm.

For example, VAR played a major role in Argentina’s shock defeat against Saudi Arabia and may eventually cause to Leonel Messi never lifting the world cup in his lifetime. Similarly, Japan’s victory over Germany was a result of in-game tactics that the German players may not have expected. This World Cup promises to be exciting with lots of hidden surprises. We will have to wait until 18 December to find out who will be raising the trophy.

Want to write?

Write an article and join a growing community of more than 159,100 academics and researchers from 4,552 institutions.

Register now