One of the oldest challenges for football fans is to estimate the strength of teams. For years and years, this was quite a simple matter actually. You had the league table, showing points won and goals scored or conceded, and that was it. All the rest was left to debating softer observations as to which team didn’t yet get the rewards for their good performances, or which team was flying higher than their wings would carry them.

Fast forward to the days of football data and all kind of detailed metrics are just a mouse click away, thanks to sites like WhoScored and Squawka delivering OPTA data for free. No longer are we limited to objectively ranking teams on the basis of points and goals only. Shots, shots on target, or even expected goals from those shots can be thrown into the debate. What’s more, some people might even argue that only a subset of those shots should be counted, at close or at tied game states perhaps?

In this post, we will study the performance of 5 different metrics and see if we can established which one holds the best predictive power at which stage of the season. The data set consists of the 2012/13 and 2013/14 seasons for the top-5 leagues in Europe, and the 2013/14 Eredivisie. I’ve selected the following metrics, with clear definitions found at the bottom of this post.

- Points per Game
- Goal Ratio
- Total Shots Ratio
- Shots on Target Ratio
- Expected Goals Ratio

All of these metrics are tested for their correlation to future performance in terms of future points per game and future goal ratio. This is done for each match round of the season.

For example, after 8 match rounds played, all twelve metrics are computed over match days 1 to 8 and compared to points per game and goal ratio from match round 9 to the end of the season. This is done by fitting a linear model and noting the correlation in terms of R squared. This process is repeated for each metric at every match round, to obtain R-squared values for each metric at each point in the season.

**Points per Game and Goals Ratio**

The first graphs show the output of the two historically available parameters: Points per Game and Goal Ratio in predicting future Goal Ratio and future Points per Game.

This is basically the equivalent of looking at the league table and expecting trends to continue as they do. Not a bad habit, and it does certainly hold valuable information, but it has several disadvantages too. Most notably, the correlation takes a while to pick up, settling down around week 10. Also, beyond that moment, hardly any improvement is made with respect to predicting future performance. A final interesting remark is that Goals Ratio is quickest to pick up information, but Points per Game might just be a touch better in the final stages of the season. This ties in with the statistical intuition that goals are the more frequent occurrence, and therefore pick up signal earlier, but also collect more noise along the way.

Note that the graphs drop off after the halfway point of the season. This does not indicate that the model becomes worse, but rather that there is more variety in the outcome parameter. It’s simply easier to predict points tallies and goal numbers with more matches to play than it is to predict single match outcomes, as is the case near the end of the season.

The slight kick-up at match day 34 reflects the fact that Bundesliga and Eredivisie seasons are 34 matches long and the rest of the leagues in the dataset play 38 match seasons.

**Total Shots Ratio**

A little under four years ago, a concept called Total Shots Ratio made its way into the (then quite small) world of football analytics. Pioneer James Grayson explored it on his blog, a site that is still a great read to get yourself acquainted with the development of football analytics.

Total Shots Ratio, of TSR, proved a very interesting way to rank teams, without having to resort to direct output like goals scored or points won. Shots attempted do reflect the balance of play, and the metrics does recognize under or over performing teams.

Look at that massive boost of knowledge early in the season. It now proved possible to identify the strength of teams as early as after seven of eight match rounds, with an accuracy comparable to what traditional methods could only achieve at their height in mid-season.

TSR, like Goals Ratio, forms an improvement early in the season by picking up signal a lot earlier. After all, shots are roughly 10 to 11 times more frequent than goals. In the end, it turns out that this method collects noise at a faster rate too. Not all shots are equal, and some teams have tactical setups that allow them to consistently perform better or worse than TSR suggests. As is shown in the sharp drop in performance that TSR shows after match round 25. After match day 28 you’re generally even better off just looking at the league table!

**Shots on Target Ratio**

In the introductory post linked above, James Grayson declared his preference for using Shots on Target Ratio (SoTR) over TSR, but later on this line of thought got some nuance. Theoretically, SoTR could be a nice method to lose the noise that weakens TSR in later stages of the season, hopefully without losing too much of the early signal that makes the method so powerful.

I’ve always gone with TSR over SoTR because I feared the lack of signal in a smaller sample of shots, and it was increased signal that made TSR such a powerful tool early in the season. I was wrong, it seems. Despite holding roughly one third of the sample of TSR – around 1 in 3 shots is on target – the SoTR metric picks up its signal equally fast and holds it longer. Just like it theoretically should!

At its peak of predictivity, the mid-season, SoTR performs notably better than TSR, which should make it the preferred method to treat raw shot counts. As said before, not all shots are equal, and the capacity to get shots on target seems to hold predictive power for future performance. Partly this may be the effect of better teams simply firing more accurately, but it may also contain information about playing in favourable game states. After all, it’s now generally known that teams trailing a match by a single goal see a drop in shooting accuracy, while teams leading by a single goal rise their shot accuracy.

**Expected Goals Ratio**

Next up in football analytics land was the appearance in 2013 of Expected Goals models. Simply said, each shot is assigned a number between 0 and 1 to reflect the odds of such a shot resulting in a goal. This process is not done subjectively by hand, by objectively, by using large databases of earlier shots and determining correct odds by regression methods. Expected Goals models do differ a slight bit from one model to another, but the mainstay of the input is shot location and shot type. If traits hidden in the Expected Goals methodology reflect their playing style and/or their playing quality, this method should form an improvement on raw shot metrics.

The conclusion from these graphs is quite simple actually. Expected Goals Ratio forms an impressive improvement on raw shot metrics at each and every point in the season. It picks up information much like the raw shot metrics do in the very early stages, then predicts future performance significantly better at early to mid-season, and also holds predictive capacities for longer. It makes sense to use Expected Goals Ratio from as early as four matches played. Even that early, it is as good a predictor for future performance as Points per Game and Goals Ratio will ever be.

The metrics were defined as below.

- Points per Game: points won / matches played
- Goal Ratio: goals for / sum (goals for + goals against)
- Total Shots Ratio: shots for / sum (shots for+ shots against)
- Shots on Target Ratio: shots on target for / sum (shots on target for + shots on target against)
- Expected Goals Ratio: expected goals for / sum (expected goals for + expected goals against)

JoeGood read, thanks for posting!

dinobaggioThis is very nice work Tegen but surely you cannot plot the Expected goals ratio for a whole league and expect it to be an accurate predictor for every club in the league? I mean the correlation for the majority of teams might be excellent but a few outliers above and below the correlation line will keep everything looking hunky dory when in fact the individual teams in league itself may vary quite a bit from the correlation?

I see you say you can fit the correlation from as early as game week four but as we all know the variability in fixture strength and form for teams in the early season can lead to wildly erratic differences in points per game or goals per game or shots per game compared to say the correlation you will get after 12 or 15 games when we have more data to go on. I’m really interested in this field and use shots inside the box as the most important metric for looking for expected goals in the future but I think trying to suggest you can fit a correlation to every team based on the average of all the teams is misleading.

Have you looked at the difference between the correlation for the top 3 of each league compared to the bottom 3 for example?

11tegen11Post authorI’m not sure we’re talking about the same thing here.

All points in these plots are an R-squared value. Those values are all derived from regressions in scatter plots. Each scatter plot holds two points of data per team: the metric and the future performance. So each scatter plot has as many dots as there are teams in the dataset at that match day. For match day 1 this is all teams from all eleven leagues tested, up to match day 33. Beyond that, teams from the Bundesliga and the Eredivisie are not in the set anymore, so the plots from match day 34 to 37 are done on teams from the remaining 8 leagues.

Obviously, the predictive power of all metrics increased as they are fed more information during the early days of the season. This holds true for all metrics alike though.

dinobaggioYes I’m pretty sure I know how the data works but because you are using the average of a multiple of teams I think you are not quite getting the variance that can happen between form teams and non form teams and quality teams and teams of poorer quality.

The reason the graphs work so well could just be that their are an equal number of quality teams getting ultra consistent results which balance out the poorer teams which get inconsistent results and likewise form teams and teams out of form?

11tegen11Post authorForgive me for not completely understanding your point still. How could this be of a different influence to different metrics?

All that R squared does is compute the distance between the points on a scatter plot and the regression line. I’m aware of its potential limitations and your point on good and mediocre teams is well taken, but I don’t see how this effect could apply to one metric and not to the other metrics.

dinobaggioI was not implying it doesn’t effect other metrics but stating that I don’t think this model is a very accurate way of judging a teams potential to score goals/points. Maybe it is for the average team in the league on average form against average opposition but by not differentiating between the good and bad sides and excluding the form element etc I think the model cannot possibly be effective.

11tegen11Post authorMy bold statement is that form only exists after an event. I have yet to see any evidence of people being able to show evidence of teams in or out of form prior to an event.

On your second point. May I ask what you’d consider a better predictive model? An ExpG based team rating does an excellent job of separating good and bad sides, as is reflected by the correlations with future performance indicators. All sorts of teams, be it good, mediocre or bad, are in this dataset, so the correlation with future performance reflects all kinds of teams.

dinobaggioAs I say but maybe not clearly enough I think this model would be fine for predicting goals or points returns for the average team in average form playing a fixture of average difficulty. What I don’t think is possible is using average stats for multiple leagues to predict short term future returns for every team regardless of ability, form or fixture difficulty.

I loved your previous ExpG work for individual teams and while the limitations listed above are still relevant to these individual teams (availability of playing resources is another variable) I think it is somewhat easier to interrogate the data and the reasons for outliers and to distinguish between candidates for regression or teams playing to a sustainable level.

I am only new to predictive modeling and am learning mainly through your fine work so I am not aware of any potentially better models out there. I was just trying to offer constructive feedback on potential limitations with your model which might help you to improve it in the future.

11tegen11Post authorNo worries, this type of debate should be conducted to improve understanding of the model, and to improve the model itself.

Please note that this work is also ExpG’s of individual teams correlated to future performance of said individual teams. What you see in the graphs is just the final correlation coefficients between ExpG’s and future actual performance. For predicting future performances of individual teams one should simulate each remaining match for that particular team.

I should probably write a separate post about how I transform a team’s ExpG numbers into winning, drawing and losing percentages for individual matches. This process involves Poisson distributions around expected match scores, repeated many times to get correct percentages for wins, draws and losses.

Hope this clears things up a bit for you.

dinobaggioIt clears things up somewhat but if you are using average correlation figures for specific teams I still think the model cannot be very effective.

Have you checked the efficacy of the model by predicting say 5 random teams goals and points over the next 5 or 10 games?

11tegen11Post authorNo, I haven’t done such random sample analysis. It could be a future direction indeed.

Which goes to show the importance of debating such lines of thought here.

JW1.)

Why are you considering ratios instead of differences? You use these stats in your prediction models as well, but imagine a very defensive team that creates 0.1 ExpG per game and concedes 0 ExpG. This team has an excellent ExpG ratio, but will obviously not be able to win a lot of games and will only end up in the middle of the table due to a lot of draws. This example is a bit extreme, but I can imagine this kind of thing creating a small bias, which would not appear using ExpG difference instead of ratio.

2.)

I think your model could benefit from the following.

Are teams always playing at their best? I recall a weekend in which Ajax and PSV played Willem II and Dordrecht respectively. Both expected an easy win, but acted differently. Ajax thrashed Willem II 5-0, while PSV didn’t seem to want to tire themselves, tried a little less hard, and won 3-1 (if I remember correctly, the details are not important). Ajax will build up more ExpG and your models will predict them to be better, but are they really?

I think this is one of the reasons Feyenoord is a bit overrated in your models: they always play at their best, even against way lesser teams, and get a lot of ExpG there, but can not perform well enough in matches against better opponents.

(By the way, it might be that my examples are not correct and I am interpreting the match outcomes wrong, but correct or not, this illustrates what I want to say.)

This problem could be avoided by looking at the correlation between ExpG and strength of the opponent. If there is a strong correlation (like Feyenoord and Ajax in the example), a team is probably not able to perform in matches against strong teams, while a weaker correlation (like PSV) probably means they are not trying hard against weaker teams and are hence underperforming a little, and the model is underrating the team.

Have you considered this before? Is it possible to take this into account in your models?

11tegen11Post authorThanks for your extensive comment!

Regarding your first point, the use of ratios over differences.

Theoretically I can follow your line of thought, and it is something that I’ve tested (without writing about it) a while ago. However, I should probably repeat that test now, given that fact that we have more data available, and the testing was done prior to constructing my ExpG model.

So, that’s one area that I will look into in the near future. It’s also a point made by Daniel Altmann, who prefers ExpG differences over ratios in his own models.

Your second point concerns teams tactical choices, manifesting themselves in more or less effort in matches against particular opposition. This is an interesting potential source of bias indeed.

The best way, although not complete, to try and assess this might be to isolate performances in certain Game States. For TSR and SoTR this has been done before, by others, by looking only at tied (GS 0) or close (GS -1, 0, 1) match situations. This assumes that teams will only take their foot off the pedal once they are in the ‘safe’ Game Sate of +2. If that phenomenon is happen at all.

I’ve looked at this, but I have decided not to include that subanalysis in this post, for the sake of accessibility. However, I will put it out here soon, with graph specifying how the metrics do in certain match situations.

I think this method will show that the phenomenon either does not truly exists, or that its effect are so small that correcting for this will allow more noise and thereby weaken the model.

JWAbout the second point: looking at the game state will certainly make up for some of this, but I think the problem can be solved further by the method I proposed.

For example, the difference between Ajax’s level of play in the Eredivisie and the Champion’s league are enormous, which is mostly due to motivation and has nothing to do with game state. I think this also happens on a smaller scale within the league.

I find it hard to estimate how hard it is to look into this correlation I described, but I think it is at least worth a try.

Pingback: A quick cautionary note on predictiveness and R-squared | James' Blog

Pingback: I tiri a partita concessi dalla Lazio. Solo la Juve fa meglio | TripAZ-News.it

Pingback: Introduction to Analytics in… Soccer

Pingback: Sport Analytics – enlaces interesantes | Tachnovation

Jb MacCartyI appreciate the work that went into this, especially the data collection and cleaning.

If you wanted to keep the same ‘aggregate-to-round’ to predict ’round + 1 until the end of the season’ regression mentality, then there is one major question that stands out…

Are these bivariate regressions? with each bivariate correlation then plotted on the same graph? If so, have you tried a multivariate regression, using a penalizing term (think LASSO or Ridge regression) to adjust for the (I’m presuming here) collinearity? That way, you can not just say ‘which metric predicts future success better’ but something even more powerful: “conditioning on all other metrics (“holding everything else equal”) this metric or that metric is more predictive, and here is by how much”, and your overall R^2 of the model will increase.

If you are willing to shift gears to assess predicting power of these metrics, have you:

1. Thought about treating this data as a time series (e.g. ARIMA with external regressors)

3. I agree with some comments above that it would be smart to bring in other variables as well (quality of opposition, indicators for fighting off relegation/for a European spot, etc.) but this would be adjusting the bivariate correlation to a multivariate regression–and one with more structure–something along the lines of a random-effects model.

3. If you want to try something else, why not look at a latent-factor model, and using the components as predictor variables?

Keep up the good work!

Pingback: Anatomy of a Shot | Deep xG

Pingback: Een eerste expected goals (xG) model | persoonlijkefouten

Pingback: The best predictor for future performance is Expected Goals | Allting Direkt

Pingback: Dimitri Payet's 2015/16 Season | Eastbridge