Introducing the Composite Team Rating

Football is a game that seems simple at first, and proves more and more complicated the longer it is studied. It is also relatively easy to explain when the match is over, but so very hard to predict beforehand. What’s a valuable metric to assess historical performance isn’t always the best tool to make predictions. This article will introduce a composite team rating, and show that it is a better predictor than Expected Goals alone.

 

Early days

Before the birth of analytics, a simple look at the table was all we had. An educated guess as to which team would win the next match was the best we could do. Much like it’s done on TV nowadays, really.

In the early days of analytics we thought we could do better than educated guesses. We wrote pages full of shot based metrics, with Total Shots Rate the darling we loved most. Later on, with more data came newer, more exciting and more attractive loves and TSR was traded for Expected Goals. The gap between the mainstream and the analytics blogosphere widened.

Intuitively simple, counting shots and weighing them by quality of scoring, Expected Goals is now the mainstay of assessing performances of football teams. It’s probably coming to a TV near you, in some universe, at some point in time.

 

History or future?

I still believe Expected Goals is the best metric to represent historical performances, i.e. answering questions like which team has performed best over a certain period of time. In predictive modelling, however, things may just be a bit different. It’s not so much about historical performance, it’s about how much of that historical performance proves repeatable.

Until recently, I took a metric that I thought best represented historical performance, like ExpG-ratio. From there on, I simulated matches based on these ratings, expecting teams to just cruise along at the speed indicated by their historical performances in terms of ExpG-ratio. With ExpG-ratio being the single input of a predictive model, each team at a certain ExpG-ratio was expected to perform at the same level in future matches. Of course, fixture planning would dictate how many points they would win, but the underlying predicted performance would be the same.

 

A fictional example

Imagine Chelsea having a reasonable, though not overwhelming season start. After ten matches they have recorded an ExpG-ratio of 0.600. Good for most teams, but usually not enough to seriously challenge for the title. In the same fictional league, West Ham United have had a magnificent start to their campaign, some balls just happened to bounce well, a few calls went their way, and the Hammers have also recorded an ExpG-ratio of 0.600 after ten matches.

Common sense would dictate that Chelsea are expected to operate at higher level than West Ham over the remaining matches of the season. Most likely, Chelsea will return to elite 0.650 levels, while West Ham would regress towards mid-table 0.500 obscurity. We all watch football, we know that, right?

The model, however, doesn’t have eyes and is just fed two teams with identical ExpG-ratios. Hence it expects both teams to continue at that 0.600 pace we all just said we knew both teams wouldn’t hold. So, what do we know that the model doesn’t?

 

Historical regression

Our knowledge of Chelsea and West Ham is partly based on historical information, like Chelsea’s excellent performances over the past decade, and West Ham’s lower mid table years with some time spent in the Championship recently. Theoretically, we could plug that into the model and have teams regress to historical performances. But there’s a danger here, and a big one too. Things can change quite fast in football, even at club level. The departure of Sir Alex at United, the sudden influx of money at clubs like City of PSG.

Some historical regression is good, but be careful, too much historical regression in the model holds a big risk of missing sudden changes.

 

Feed the model

The other solution, and that’s where I’m going, is to feed the model more information about the season at hand. ExpG numbers are still an important driver of the team rating that the model produces, but raw shot numbers, shot on target numbers, goals scored or conceded, and even pass ratio’s hold information that may make ExpG numbers more steady.

Chances are that even with equal ExpG-ratios of 0.600 over their fictional first ten matches, Chelsea and West Ham will differ in terms of shot numbers and pass patterns. These shot numbers and pass patterns form like a finger print of a team’s underlying activities leading to a certain ExpG-ratio. Some lucky deflections, some welcome yet debatable offside calls, a sending off here of there, it can all help steer a club’s ExpG-ratio in a certain direction that won’t hold for the future. Use more metrics, and your assessment will gain stability, particularly in the early stages of the season.

Adding this info to the ExpG-ratios will, in all likelihood, lead to a lower Team Rating for West Ham and a higher Team Rating for Chelsea. Subsequently, more points will be predicted for Chelsea than for West Ham, despite an equal assessment of present day performance in terms of their 0.600 ExpG ratio. At the end of this article, a more mathematical detailed explanation of the present Team Rating model will be given. For now, remember the new Team Rating as a broad assessment of shot numbers, Expected Goals and passing patterns.

 

Performance

The best way to show the performance of the Expected Goals model and the new Team Rating is in the graph below. This shows gap between predicted points per game for the remaining matches of the season and actual points per game for those matches at different stages of the season.

ST DEV for Future PPG - All Full MetricsPredicting is hard when you’ve got little information about the teams, so initially the gap between predicted points per game (PPG) and actual PPG is quite wide. As the season progresses, predictions become more accurate, to the point where the amount of remaining matches becomes so small that predictions, again, are harder to make.

From this graph it is clear that using very raw information like points or goal is not a good way to predict future points. In other words, the league table is not your best source of information if you want to find out about team strength.

Total Shots Ratio and Shots on Target Ratio are a big step forward in comparison with points or goals. Predictions based on these relatively simple metrics are better in all stages of the season.

The red line is the Expected Goals model, which delivers the best predictions of any non-composite rating from match day 12 onwards, but needs those 12 matches to pick up enough information.

The orange line is the new 11tegen11 Team Rating. As said before, it hold information about goals scored, total shot numbers, shot on target numbers, actual goals scored and passing patterns. See below for more details.

It is quite clear that the new Team Rating outperforms the old Expected Goals model significantly. Even after computing Expected Goals ratio, there still is valuable information left in simple shot, pass and goal numbers!

For any predictions made I’m using the Composite Team Rating now, unless explicitly stated otherwise.

For interested readers, I will explain the model in more detail below the next graphs. If you’re just here for the football, and you’re not into the details of the machinery behind the predictions, feel free to check out here!

Team Rating - bar chart - English Premier League 2014-15 16 februari 2015

 

The model I now use for predictions concerning the 2014/15 season is a linear regression based on two seasons of data: 2012/13 for the top-5 leagues and 2013/14 for the top-5 plus the Eredivisie.

The dependent variable in the regression is future PPG. Independent variables taken from the present season are goals, shots, shots on target, ExpG, passes, passes completed, passes in the box, passes completed in the box. All of these are recorded both as scored and as conceded. Furthermore, points per game, goals for and goals against in the past season are used, as well as league (as a categorical parameter).So, each teams records as many lines as there are match days, minus one since there’s nothing left to predict after all matches have been played.

The regression formula produces a Team Rating for each team, which is then scaled to a format to resemble the numbers we know from 0-1 rating scales like TSR and ExpG-ratio. So elite teams will have Team Ratings of 0.700 while very poor teams will score Team Ratings like 0.350. Since this Team Rating comes from a regression formula based on multiple historical leagues, it is no longer true that the average over a league should always be 0.500.

Oh, and for the graph in this article, a regression was run on 2012/13 top-5 leagues data only, to allow the 2013/14 data for the assessment of the model’s performance. Always avoid over fitting, you know.

Obviously, the disadvantage of a Composite Team Rating is that it’s less intuitive like Total Shots Ratio or Expected Goals Ratio. This makes it harder for a reader to assess what it really means, and it forces the reader to trust the model, since the ‘under-the-hood’ part becomes quite complicated. On the flipside, the model performs better and produces more stable predictions. For me, the improvement in prediction accuracy beats the reduction in interpretability.

If the simpler model would perform in the range of the more complex model, we would always prefer the simple one. For the audience it’s easier to understand what goes on within the model, and therefore decide to trust the model. For the creator of the model it’s easier to spot potential errors in either the data the goes into the model, or in the model itself. But I guess the easy days of Total Shots Ratio are over. Football just ain’t so simple.

 

Feel free to ask questions in the comments below if you want to know more!

10 thoughts on “Introducing the Composite Team Rating

    1. 11tegen11 Post author

      Thanks, Michael!

      The regression found most factors significant (at p < 0.05), but not all. We should be careful, however, interpreting that result, since these inputs are correlated with each other to varying degrees. Even the direction of the relation between a variable and the output cannot be taken from a multivariate analysis with correlated independent variables. For example, some clearly positive actions may have a negative association in the multivariate analysis, simple due to it being correlated with other variables. This is also the reason why I've decided against showing significance in this post, or the coefficients for the different variables used.

      Reply
  1. samh123

    Cool article!

    Is there a risk that you manipulated all the different components in order to maximise predictability for just the 2013/14 season and your composite rating might not always beat the other metrics by so much?

    As the best team going forward == the best team ‘now’
    I think trying to maximise predictability is a very interesting area, could be very valuable for rating managers and all that kind of stuff!

    Reply
    1. 11tegen11 Post author

      The predictions shown in this article were made on the basis of 2012/13 data only, so the components are not manipulated to maximise predictability in the 2013/14 data. This principle of separating training data and test data is key to prevent the issue you describe, which is termed over fitting.

      And indeed, in terms of assessing performance, doing repeatable things is very important in decision making.

      Reply
  2. jaw83

    Have you tried running a regression on the residuals in expected goals versus future goals to see what expected goals might be missing?

    One obvious thing expected goals seems to miss is finishing skill. We could see if that was the case by regressing goals for on the error in goals for predicted by xG and see how much of that error it explained. Similarly, xG misses goaltending skill. My prior is that goaltending skill varies less than finishing skill, and that we will expect that goals scored explains more of the error in the predictions made by xG than goals allowed will explain the errors made by xG allowed.

    Reply
    1. 11tegen11 Post author

      ExpG and finishing form a challenging field. I’ve written multiple articles trying to identify finishing skill once ExpG is accounted for, but have failed to find repeatable finishing skill so far.
      A big part of the problem is the low numbers and high heterogenity in football. Simply said, goals, and even shots, are quite infrequent events that also happen to have very different characteristics.

      So far, any corrections for historical finishing have only made things worse, due to over fitting. The model performs better on training data once finishing is corrected for, but it performs worse on test data because the finishing skills ‘identified’ do not carry over to the next season.

      Reply
  3. raffaele

    Very Nice, indeed. Can’t wait to see some predictions for Serie A. Just to see if your rating does justice to some teams who are penalized (or overvalued) by “shot only”-based metrics. Good job.

    Reply
  4. James

    Really interesting read and aligns to my feelings that expG is a good indicator, but has limitations as a predictor for goals/outcomes of football.

    For me (and raised by some other bloggers out there) one big reason for that is that shot data needs to be aligned to the game state. i.e a team losing 3-0 may have a flurry of late shots that boost their expG in that match, but would have little outcome on the end result. Are you factoring this into the results that feed into this new composite rating at all? or if not any ideas how this could be added? As I feel it would sharpen up a lot of predictive models if shot data was weighted according to the game state.

    Reply
    1. 11tegen11 Post author

      Thanks!

      I guess we all recognize from the work done on Game States that this is a potential source of bias in any shot based model, or even any model at all.
      Probably this Composite Team Rating softens the bias a bit by taking multiple sources of information on board, but it doesn’t directly address the issue indeed.

      It could a nice step to try and factor in the same independent variables, but split over different Game States. Using the scripted version of the model this could well be done. I’ll save the thought for now, and if this exploration seems fruitful, you’ll find more on the site sooner or later.

      Reply

Leave a Reply