One of the oldest challenges for football fans is to estimate the strength of teams. For years and years, this was quite a simple matter actually. You had the league table, showing points won and goals scored or conceded, and that was it. All the rest was left to debating softer observations as to which team didn’t yet get the rewards for their good performances, or which team was flying higher than their wings would carry them.
Fast forward to the days of football data and all kind of detailed metrics are just a mouse click away, thanks to sites like WhoScored and Squawka delivering OPTA data for free. No longer are we limited to objectively ranking teams on the basis of points and goals only. Shots, shots on target, or even expected goals from those shots can be thrown into the debate. What’s more, some people might even argue that only a subset of those shots should be counted, at close or at tied game states perhaps?
In this post, we will study the performance of 5 different metrics and see if we can established which one holds the best predictive power at which stage of the season. The data set consists of the 2012/13 and 2013/14 seasons for the top-5 leagues in Europe, and the 2013/14 Eredivisie. I’ve selected the following metrics, with clear definitions found at the bottom of this post.
- Points per Game
- Goal Ratio
- Total Shots Ratio
- Shots on Target Ratio
- Expected Goals Ratio
All of these metrics are tested for their correlation to future performance in terms of future points per game and future goal ratio. This is done for each match round of the season.
For example, after 8 match rounds played, all twelve metrics are computed over match days 1 to 8 and compared to points per game and goal ratio from match round 9 to the end of the season. This is done by fitting a linear model and noting the correlation in terms of R squared. This process is repeated for each metric at every match round, to obtain R-squared values for each metric at each point in the season.
Points per Game and Goals Ratio
The first graphs show the output of the two historically available parameters: Points per Game and Goal Ratio in predicting future Goal Ratio and future Points per Game.
This is basically the equivalent of looking at the league table and expecting trends to continue as they do. Not a bad habit, and it does certainly hold valuable information, but it has several disadvantages too. Most notably, the correlation takes a while to pick up, settling down around week 10. Also, beyond that moment, hardly any improvement is made with respect to predicting future performance. A final interesting remark is that Goals Ratio is quickest to pick up information, but Points per Game might just be a touch better in the final stages of the season. This ties in with the statistical intuition that goals are the more frequent occurrence, and therefore pick up signal earlier, but also collect more noise along the way.
Note that the graphs drop off after the halfway point of the season. This does not indicate that the model becomes worse, but rather that there is more variety in the outcome parameter. It’s simply easier to predict points tallies and goal numbers with more matches to play than it is to predict single match outcomes, as is the case near the end of the season.
The slight kick-up at match day 34 reflects the fact that Bundesliga and Eredivisie seasons are 34 matches long and the rest of the leagues in the dataset play 38 match seasons.
Total Shots Ratio
A little under four years ago, a concept called Total Shots Ratio made its way into the (then quite small) world of football analytics. Pioneer James Grayson explored it on his blog, a site that is still a great read to get yourself acquainted with the development of football analytics.
Total Shots Ratio, of TSR, proved a very interesting way to rank teams, without having to resort to direct output like goals scored or points won. Shots attempted do reflect the balance of play, and the metrics does recognize under or over performing teams.
Look at that massive boost of knowledge early in the season. It now proved possible to identify the strength of teams as early as after seven of eight match rounds, with an accuracy comparable to what traditional methods could only achieve at their height in mid-season.
TSR, like Goals Ratio, forms an improvement early in the season by picking up signal a lot earlier. After all, shots are roughly 10 to 11 times more frequent than goals. In the end, it turns out that this method collects noise at a faster rate too. Not all shots are equal, and some teams have tactical setups that allow them to consistently perform better or worse than TSR suggests. As is shown in the sharp drop in performance that TSR shows after match round 25. After match day 28 you’re generally even better off just looking at the league table!
Shots on Target Ratio
In the introductory post linked above, James Grayson declared his preference for using Shots on Target Ratio (SoTR) over TSR, but later on this line of thought got some nuance. Theoretically, SoTR could be a nice method to lose the noise that weakens TSR in later stages of the season, hopefully without losing too much of the early signal that makes the method so powerful.
I’ve always gone with TSR over SoTR because I feared the lack of signal in a smaller sample of shots, and it was increased signal that made TSR such a powerful tool early in the season. I was wrong, it seems. Despite holding roughly one third of the sample of TSR – around 1 in 3 shots is on target – the SoTR metric picks up its signal equally fast and holds it longer. Just like it theoretically should!
At its peak of predictivity, the mid-season, SoTR performs notably better than TSR, which should make it the preferred method to treat raw shot counts. As said before, not all shots are equal, and the capacity to get shots on target seems to hold predictive power for future performance. Partly this may be the effect of better teams simply firing more accurately, but it may also contain information about playing in favourable game states. After all, it’s now generally known that teams trailing a match by a single goal see a drop in shooting accuracy, while teams leading by a single goal rise their shot accuracy.
Expected Goals Ratio
Next up in football analytics land was the appearance in 2013 of Expected Goals models. Simply said, each shot is assigned a number between 0 and 1 to reflect the odds of such a shot resulting in a goal. This process is not done subjectively by hand, by objectively, by using large databases of earlier shots and determining correct odds by regression methods. Expected Goals models do differ a slight bit from one model to another, but the mainstay of the input is shot location and shot type. If traits hidden in the Expected Goals methodology reflect their playing style and/or their playing quality, this method should form an improvement on raw shot metrics.
The conclusion from these graphs is quite simple actually. Expected Goals Ratio forms an impressive improvement on raw shot metrics at each and every point in the season. It picks up information much like the raw shot metrics do in the very early stages, then predicts future performance significantly better at early to mid-season, and also holds predictive capacities for longer. It makes sense to use Expected Goals Ratio from as early as four matches played. Even that early, it is as good a predictor for future performance as Points per Game and Goals Ratio will ever be.
The metrics were defined as below.
- Points per Game: points won / matches played
- Goal Ratio: goals for / sum (goals for + goals against)
- Total Shots Ratio: shots for / sum (shots for+ shots against)
- Shots on Target Ratio: shots on target for / sum (shots on target for + shots on target against)
- Expected Goals Ratio: expected goals for / sum (expected goals for + expected goals against)