One of the most attractive parts of football is obviously that it is so unpredictable. Who would want to watch full matches when the outcome is known before the kick-off? In the low-scoring game that football is, predicting winners is difficult. Yet, at the same time, some teams are genuinely better than others, they win more matches and are rightly provided better pre-match odds of winning. The fact that they don’t win all of the time proves that there is an element of luck involved.
This post will try to separate the two entities that determine who wins a football match: luck and skill. And in order to do so, we must agree on the difference between these two qualities. The key concept that makes a lucky team different from a skilled team is sustainability. Any team would be able to pull off a miraculously good performance in a single match, but to string good results together requires more than luck, it requires a certain level of skill. The better the result and the longer the string of good results required, the higher the level of skill needed.
So far, so good. But which factors represent team performance and are sustainable? Goals scored? Goal difference? Points won? Shots taken?
Shots, conversion and saves
The brilliant James Grayson looked into this matter and used a large data set, containing 702 back-to-back seasons, to assess the season-to-season correlation between offensive and defensive parameters that indicate a level of performance (goals, shots, shot conversion, etc.). This research raised more than a few interesting points, most of which are inferred from ice hockey analysis, where the following statistical measures are much more common.
First, the variable that showed the best correlation between one season and the next proved to be the total shots ratio, or TSR. So the best predictor for future performance seems to be the fraction of shots within a match that a team takes. This shows even more correlation between seasons than the number of points obtained by a team. Please check James’ Blog, where he explains this very well, and in more detail.
Secondly, two variables that are very important in deciding which team wins a particular game, both the shot conversion (what fraction of shots ends up in goal) and the saves fraction (what fraction of shots conceded does not end up in goal), show little or no correlation going from season to season. And this deserves some explanation.
Regression to the mean
It seems that the fraction of shots converted, or shooting percentage (Sh%), and the fraction of shots saved, or saves percentage (Sv%), are more influenced by luck than by skill, and much more so. This is shown with the introduction of the concept of ‘regression to the mean’. What this important principle means is that any outlying performance over a short stretch of games will tend to move towards the average for that parameter. James explains this concept very well on his blog, and Wikipedia serves those wanting the most detailed of explanations.
So, if a team shows an excellent Sh% over a season of games, think Heerenveen’s 16.7% shot conversion, or an excellent Sv%, like Vitesse’s 9.0%, this shows an unsustainable performance. Next season, Heerenveen is more than likely to suffer from a severe drop in Sh% and Vitesse to suffer from a drop in Sv%, based on this principle.
In ice hockey analysis, both Sh% and Sv% have been combined into a single stat, called PDO. Ice hockey stats have the nasty habit of being named after their ‘inventor’, rather than after what they measure and the term ‘PDO’ has been launched by Brian King, whose internet alias happened to be PDO.
PDO = 1000 (Sh% + Sv%)
That’s all. Simple as that, a better shooting percentage and a better saves percentage gives you a higher PDO. Most commonly it’s multiplied by 1000 to get rid of the small numbers, but that’s just convenience. Since one team’s Sv% rises when another team’s Sh% drops, the average PDO over a match, or a league, will always be 1000.
The key concept, as explained above, is that a high PDO is simply not sustainable and a low PDO will rise with more matches played. It allows an easy assessment of how much of a team’s performance is due to skill and how much to luck.
Of course, it seems counter-intuitive to assume that individual goalkeeping skills don’t vary from team to team, but in James’ dataset of 702 back-to-back seasons, the Sv% from one season and the next showed a correlation of just 0.098. Click on that link for a nice graph!
Regarding shot conversion (Sh%), more or less the same holds true. The R2 value is only 0.150, indicating that Sh% regresses to the mean by over 60%, or in other words, that luck is a factor 1.5 more important than skill when it comes to converting chances. This fits well with an article by the excellent Mark Taylor, who used Arsenal’s 2011/12 season to show that shot conversion is neatly correlated with the match situation. In other words, if Arsenal are chasing games, their Sh% is almost half the rate that is it when they lead comfortably. Match situation may be more important than the skill of the player pulling the trigger, with the obvious caveat being that better teams (higher TSR) take more shots from leading positions and achieve a higher Sh%.
For the total shots ratio, things are different than for Sh% and Sv%. The TSR from one season to the next shows less than 13% regression to the mean, indicating skill dominates luck a factor 6 here.
So in conclusion, the short term performance of teams that we are used to study, and a season of 30-something matches is definitely short term in a sport as low scoring as football, leads analyists to focus on luck, rather than skill. Using the simple concept of the PDO (thank you, ice hockey analysts!) allows to separate (un)lucky teams from (un)skilled teams, while the total shots ratio (TSR) is the best representation of a team’s skill.
To round off this post, here’s a table of the 2011/12 Eredivisie teams and their respective PDO’s, Sh%, Sv% and TSR.
Data for this table has been provided by Infostrada Sports.