This post will look at the latest love child of the football analytics community, Expected Goals, commonly referred to as ExpG or xG. I’ve noticed a lot of questions via Twitter recently, regarding this relatively new concept. Spread across multiple posts, the concept is mentioned and has been explained on 11tegen11 before, but I felt the need for a comprehensive explanatory piece on ExpG to explain this important concept, and to use it for future reference.
ExpG stands for Expected Goals. It measures not how many goals a team has scored, but how many goals an average team would have scored with the amount and quality of shots created.
Each goal scoring attempt is assigned a number based on the chance that this attempt produces a goal. Typical parameters to use are shot location and shot type (shot vs header). Some models, including the one I use on 11tegen11, also use assist information to separate through-balls from crosses.
Teams that produce more ExpG than they concede have the best chances of winning football matches.
Total Shots Rate
ExpG has its roots in another key metric in football, Total Shots Rate, or TSR. Before trying to grasp ExpG, it is important to get familiar with shots rates.
Total Shots Rate = Shots For / (Shots For + Shots Against)
This formula provides TSR on a 0 to 1 scale. If a team takes all shots in a match, or a series of matches, TSR will be 1, and the more shots it has to leave to opponents, the lower TSR gets. On average, over multiple teams in the same league, TSR will always be 0.500, since each shot for is a shot against for another team.
TSR is pretty simple, yet it is a powerful predictor for future performance of football teams. Ever since its introduction to football, by James Grayson, TSR has dominated the analytics community. James has shown TSR to have the two qualities that are essential for a powerful team ranking tool.
- TSR shows a strong correlation with both points per game, and goal difference.
- TSR in one time period shows a strong correlation with TSR in the next time period.
If only the first condition is met, the metric would be strong in telling what has happened, but does not translate into the future. Goal keeper saves percentage is a nice example of a stat that helps explaining what has happened, but holds no power for matches still to come.
If only the second condition is met, the metric would be strong in translating into the future, but not correlated to performance. Team shirt color is a nice example, where translation into the future is easy, but a relation to performance does not exist.
The problem with TSR
The problem with TSR is that it treats all shots equal, which does not fit the fluency of football, where shots are not equal. Shots may come through a crowd of defenders from 40 yards out, or from the penalty spot in optimal circumstances. For TSR, both shots count as one, and both influence TSR equally.
This induces errors and probably also bias.
Errors arise because some shots are worth more than others. Sometimes a team creating 20 shots did a powerful job, but other days the team was just trigger happy and produced weak quality output. It may sound weird, but errors are not too much of a problem in a predictive model.
Bias is much worse.
If all teams produce and concede an equal case mix of poor and high quality shots, TSR would, despite its errors, be a perfect tool. However, there is plenty of evidence around that this is not the case. Some teams produce high quality shots, like Barcelona, and other teams produces low quality shots, like Laudrup’s Swansea.
Shot quality definitely meets condition one. It is related to performance in terms of points per game and goal difference. However, the clear cut evidence that it meets condition two is less clear. Data to measure shot quality is around since the 2012/13 season, so we don’t have high quality season-to-season correlation measurements. In other words, was Swansea’s recent struggle to produce decent shot quality just a flurry that would fix itself, or does it indicate an underlying reason that will cause the team to produce below average quality shots in the near future?
In the end
ExpG is hot, and if you’d ask me now, I’d say ExpG is the next big step that is being taken now in football analytics. Intuitively it makes a lot of sense to separate goal scoring attempts by the odds of scoring from it. However, for a new metric to be adopted for truth, a bit more work is needed. ExpG is a lot more complex than just counting shots. To show that this effort is worthwhile, we should first do a better job to illustrate its supremacy over TSR.