The Australian Open is upon us for another year, and the best tennis players in the world have assembled in Melbourne to compete for the right to call themselves “champion”.
Much of the focus will be on the genuine contenders for the men’s and women’s singles trophies – the likes of Novak Djokovic, Roger Federer, Victoria Azerenka and Serena Williams. But for many Australians, the focus of the tournament will be 20-year-old Bernard Tomic who is currently ranked 64th in the world.
Will Tomic be able to follow through on suggestions he’s good enough to break into the top ten? Or will his highest-yet world ranking of 27th – reached in June 2012 – be as close to the top ten he gets?
While such questions are very difficult to answer, there is a branch of science that’s making things easier: sports data science. More on that in a moment.
All hail the underdog
While most attention at the Australian Open will be on the genuine contenders and the local heroes, other lesser-known athletes are just as important to the success and unique personality of the tournament.
Indeed, some of the greatest moments in recent Australian Open history have been provided by these “underdogs”.
Who could forget the un-seeded Cypriot Marcos Baghdatis steam-rolling the likes of Ivan Ljubicic, Andy Roddick and David Nalbandian (see video below) in 2006 to reach the final? And what about the 2003 quarter final in which the highly competent but lesser-known Moroccan Younes El Aynaoui took Andy Roddick to the longest fifth set (at the time) in open history (21-19)?
Tennis Australia estimates that the average cost for a tennis player to travel to roughly 30 international tournaments and employ support staff ranges between US$121,000 and US$197,000 per year.
The Association of Tennis Professionals (ATP) prize-money list shows that only male athletes ranked in the top 140 are likely to have earned this much in 2012.
In response to these concerns about the sustainability of a lower-ranked player’s professional career, the organisers of the 2013 Australian Open are offering an unprecedented amount of prize-money: $30 million.
While goal-setting is important at all stages of an athlete’s career, entry into the top 100 is a particularly important milestone. Reaching this ranking not only gives athletes a high likelihood of financial sustainability, but also allows entry into the four grand-slam tournaments (Australian Open, Roland Garos, Wimbledon and the US Open).
In order to ensure both the sustainability and profile of the sport, supporting institutions such as Tennis Australia must have strategies in place, so that as many athletes as possible are able to reach this ranking goal. One of these strategies is the tracking and prediction of athlete rankings using advanced numerical modelling techniques, so such athletes are identified and supported.
Sports data science
Tennis success requires institutions which support and develop athletes – such as Tennis Australia – to possess expertise in a wide range of complex sport science disciplines, from psychology to biomechanics.
As a result, it is incredibly difficult to predict how competition and athletes will evolve (I’m sure this is one of the reasons we enjoy it so much!).
An exciting area of sport science, popularised by Michael Lewis’s film Moneyball, is sports data science – an attempt to establish some semblance of order from the chaos that is professional sport.
By analysing ATP ranking data going back to the inception of the ranking system in 1973, we are able to gain some insight into how players of different eventual skill progress in their early career.
This analysis technique, known as “Bayesian networking”, is a highly versatile and adaptable mathematical tool which can be used in a range of applications – from image processing through to medical diagnostics .
Work performed in a joint study by Victoria University (ISEAL), The Australian Institute of Sport and Tennis Australia’s Sport Science and Medicine Unit has initialy focused on the male game, largely because of the greater volume of data currently available.
Ranking progression of tennis athletes is incredibly variable, particularly in their early career. For instance, Rafael Nadal and Jo-Wilfred Tsonga achieved ranks of approximately 50 and 800 respectively by the end of their second full year on the ATP tour, yet both still managed to reach the top ten.
This variation makes life difficult when attempting to predict future success, but analysis shows that the top ten cohort are the most easily distinguishable group of players.
Of the top-ten-ranked male players at the end of 2012, seven had achieved a rank inside the top 50 in their third full year on tour. Most of these athletes were of sufficient quality to achieve an ATP ranking at a very early age (generally 16 or 17 years old), and thus these high ranks are also achieved at very early ages.
After their sixth full year on tour, eight of these athletes had actually achieved a top ten ranking.
This type of information can be used to track and compare the progression of athletes. For instance, Bernard Tomic was ranked inside the top 50 during his third full year on tour, setting him on a path that is somewhat indicative of a top ten ranking ascent.
But it’s important to remember that Tomic remains seven years younger than the current average age of the top ten – 26.9 years old – and it could be argued that he has time on his side. What is certain is that athletes follow highly variable trajectories before reaching their prime.
Distinguishing top 100 athletes from those who will not reach that milestone is much more difficult. It involves much more complex analysis since the path these athletes take are much less obvious at early ages.
The discipline of sports data science is still very much in its infancy. In the future we hope to be able to predict, with greater certainty, just how well players will progress. By doing so, we’ll be able to ensure Aussie sporting talent doesn’t go to waste.