A close look at my new Expected Goals Model

The summer may be the best part of the year for football analysis. Ironically, a break from the frantic rhythm of football stimulates exciting developments in football analysis. Behind the scenes, like players in training camps, many analysts use this time of year to lay the foundations for next season.

In my case most time has been invested in a major re-structuring of data behind the scenes, with most algorithms being rebuilt from scratch. I won’t bore you with that, and will skip to the most interesting development of this summer: the next generation of my Expected Goals model.

The idea to create an Expected Goals model was there before the data was. In 2013, enough data was available for a very basic first model, which was extended around one and a half years ago. This summer means the second major overhaul. The initial reluctance to share the workings of the model has slowly dissolved, as the model became more and more sophisticated. After all, more and more ExpG models appear and if you don’t know how a particular model works, then why would you trust any of its output?

 

So, what does the ExpG model do?

For each goal scoring attempt, a number between 0 and 1 is assigned to indicated the chance of the attempt resulting in a goal.

The easiest attempt to explain is a penalty.

Typically, penalties are awarded around 0.76 ExpG, based on historic conversion rate. A penalty is the easiest attempt to classify, since it’s a situation isolated from play, with a standard spot for taking it. The number of penalties taken is way too low to factor in player or keeper performance, so we do best by just estimating 0.76 ExpG.

 

Situations

My ExpG model is divided in 10 different types of attempts, and each of these types has its own formula. In more technical terms, separate regression models are created for each of these situations. After all, to evaluate an open play shot we needed to look at other factors than we should do for a shot from a corner. These are the 10 situations for which separate models are in place.

  • Open play shots
  • Open play headers
  • Penalties
  • Direct Freekicks
  • Indirect Freekicks
  • Corners
  • Throw-ins
  • Rebounds from a GK save
  • Rebounds from woodwork
  • Fast Breaks

The total database to work with comes from publicly available data sources where Opta data is presented. The database used to construct the ExpG algorithm contains nearly 400.000 attempts. Every match is added to the reference database to make the model even better, so the number of attempts that serves as reference is increasing rapidly.

For each situation, different factors influence the odds of scoring. For example, for open play shots a through ball assist is quite a huge bonus, but for direct free kicks though ball assists do not occur. This makes me prefer different models for different situations. The choice to go with these 10 situation is arbitrary, but this set-up connects nicely with Opta data and in my experience it works well. Basically, open play is the only one with separate models for shots and headers, while in all other situations I’ve preferred to keep all attempts in one model and use shot type as a factor in the model.

 

Which factors does the model evaluate?

For each situation, regression analysis is performed with goal or no goal as the outcome variable. The standard set of variables is tested and significant predictors are kept in the model. Let’s go over all of the factors that are in the model, and explain their influence on ExpG.

  • Shot location
    • By far the most important predictor. Most models probably use shot zones, but I prefer a different method, which doesn’t have the granularity of zones, but rather uses location as a continuous parameter. In my model, location translates into two parameters: angle of view of the goal and distance from the goal.
    • Angle of view means two lines are drawn from the shot location to each post, and the angle between these lines signifies the view the player has. For very close shots, these angles go to a theoretical 180 degrees and for long range shots, or shots from acute angles, this approaches 0 degrees. Obviously, wide angles are better, since they signify closer shots from better angles.
    • Since the angle of view parameter is also in the model, the influence of distance on ExpG is a bit more complicated. Once angle is corrected for, distance has a positive impact. Think of a shot with an angle of view of just 5 degrees. This is either a close shot from a very acute angle, or a shot from way outside the box. The chance of scoring is higher for shots from outside the box than for shots from very acute angles, so more distance will raise ExpG in this particular model, where the angle of view is already corrected for.
  • Shot type
    • Foot shots are better than headers, after all other factors have been corrected for. However, a first attempt at implementing strong or weak foot did not improve the model. Perhaps we’ll get back at this one day.
  • Big Chance
    • Opta’s coders assign this code where they judge attempts to be big chances. This factor has quite a big impact on the ExpG, which supports the fact that on ball data alone is not enough to perfectly assess ExpG. Think of a weird long range shot when a keeper is out of place. For an ExpG model this will always be a hard attempt to qualify, since keeper position is not directly available. To me, this is a perfect example where data is helped by human judgement, since off ball event data would make analysis infinitely more complicated.
  • Start of possession
    • Attempts that result from possessions won high up the pitch have a higher chances of resulting in a goal than attempts from possession that started further down the pitch. A fine (but not the only) example where defensive pressure is in the ExpG model, though not directly but indirect. This factor is a recent addition, and based on some explorations there seems to be a sharp cut-off around 4/5th of the pitch. The difference is that sharp that for now I’ve put it in the model as a binary, either an attempt comes from a high turnover, or it doesn’t.
  • Assist
    • All attempts are either assisted or they are not. Assisted shots are assisted either intentionally or not. The unintentional assist stands for a casual pass that was never intended to provide a scoring chance, but was turned into a shot anyway. Opta makes this distinction, and I think it is very handy.
    • Intentional assists are a big plus for ExpG. This makes intuitive sense, since the assisting player makes a deliberate choice to allow a team mate to shoot (or head) the ball at goal, which illustrates a quality attempt.
    • Unintentional assists have a negative impact on ExpG compared to unassisted shots. Most of these attempts will be rather forced, and not of the highest quality. Unless, of course, a brilliant dribble precedes the attempt, but that kind of factors will come later.
  • Through ball
    • Nearly the best assist type possible. A through ball eliminates one or more defenders, forcing the remaining defenders into unwanted choices, and increasing the odds of scoring. Hence, a big bonus for ExpG.
  • One pass after a through ball
    • This is the best assist possible, as far as my variables go. It’s even better than a shot coming directly from a trough ball. Mostly this pass will be sideways to eliminate, or at least wrong-foot, the goal keeper.
  • Cross
    • Crosses are bad. This could be a title for a future post, but it is certainly true that crosses have an independent negative impact on ExpG. This is not to say teams should never cross a ball, or crossing as an offensive strategy is always bad, but it does say that after all other factors have been corrected for, crosses have quite a negative impact on ExpG. Crosses may be an efficient way to create goal scoring chances, but they won’t be the best way to create quality attempts. There is a balance in crosses somewhere. Too many signifies too low quality attempts and too few signifies a team that may create too few attempts.
  • Dribbles
    • Dribbles increase the odds of scoring. Much like a through ball, at least one defender is eliminated, but other than after a through ball, said defenders may come back into position later in the same attack. So, the effect is smaller than a through ball, but it does help. Oh, and more dribbles preceding the same attempt increase the effect, which makes intuitive sense.
  • Dribbles around the keeper
    • This is probably the biggest plus for ExpG. Shooting a football into an empty net is easier than scoring with the keeper in place, who’d have thought?
  • Vertical speed
    • Attacking at speed is beneficial in open play situations. This is measured quite roughly, since data is stamped by second, but it still has an independent effect on ExpG. Leaving defenders less time to settle is a good thing, and it can be measured.
  • Number of Touches
    • Creating attempts after lots of touches in a possession spell is good. It’s probably to be seen as a sign of dislocating the defense. This isn’t to say that the passing game is superior, but when it does result in an attempt, it seems to be a relatively good one.
  • Game State
    • Even after correcting for all factors above, Game State still has an independent effect on the odds of scoring. GS -1 is the hardest state to score. However, for direct free kicks this factor is not in place, which makes sense, as teams probably don’t defend direct free kicks differently according to the score line. This sounds better the other way around, teams do defend differently according to the score line in open play, but to a lesser extent also for indirect free kicks and corners. In regular play, the effect is much more pronounced for shots than for headers. Another case that makes sense, since those GS +1 counter attacks will be aimed at creating shots, rather than headers. Small note: since better teams lead more and poorer teams trail more, the debate about Game State is full of nuances and cannot fully be put to bed based on just this data.

 

So, how good is the model?

All this complexity is worth nothing is an ExpG model doesn’t beat a simple shot count. However, proving the quality of an ExpG model is a nuanced business, and it isn’t something that I’m going to add to this, already quite extensive, post. Over time, probably during one of these dull international breaks, I’ll post another piece where this new model is tested with respect to its predictive powers, just like I did for the previous model.

 

Can we get some examples?

Yes, of course. Nothing speaks to the mind as images do. Here are some shots of the past weekend, with their respective ExpG’s. If you like these examples, we may turn this into a recurrent thing, Youtube clips with the ExpG explained in full.

El Ghazi in AZ- Ajax

I like this one, because (A) it’s a stunner of a goal, and (B) ExpG obviously assigns a low number to it. The location is unfavourable, and there isn’t a single factor that helps raise ExpG. In fact the ExpG is so low that an estimated 75 shots from this situation are needed to score one goal.

ExpG: 0.013

Situation: Indirect Free Kick

Shot location: Angle of view 10.4 degrees and distance 34.1.

Shot type: foot shot

Big Chance: no

Start of possession: no high turnover

Assist: unintentional

Through ball: no

One pass after a through ball: no

Cross: no

Dribbles: 0

Dribbles around the keeper: 0

Vertical speed: 2.05 meters per second

Number of Touches: 2

Game State: 0

 

 

Lucas Moura in Lille – PSG

Another amazing goal, but a challenging one for ExpG models. The location in itself isn’t all that good, but the context more than makes up for it. In the data this attempt shows up as a Big Chance, after a dribble around the keeper, after a through ball and after a decent number of touches. All of these factors help raise ExpG to a much higher level than any other shot from that position would.

ExpG: 0.447

Situation: Regular play shot

Shot location: Angle of view X degrees and distance X.

Shot type: foot shot

Big Chance: yes

Start of possession: no high turnover

Assist: intentional

Through ball: yes

One pass after a through ball: no

Cross: no

Dribbles: 1

Dribbles around the keeper: 1

Vertical speed: 1.6 meters per second

Number of Touches: 16

Game State: 0

 

 Georginio Wijnaldum in Newcastle – Southampton

Our third and final example is another beauty. It’s also quite different from the goals before, as we can see in the data. Attempts from fast breaks are good, and they are processed through the Fast break model, which doesn’t have the same factors aboard, since not all factors relevant in usual open play situations are also relevant in fast break attempts.

ExpG: 0.202

Situation: Fast break

Shot location: Angle of view 41.0 degrees and distance 8.5.

Big Chance: no

Start of possession: no high turnover

Through ball: no

One pass after a through ball: no

Cross: yes

Dribbles around the keeper: 0

Game State: 0

10 thoughts on “A close look at my new Expected Goals Model

  1. dean

    Thanks for the insight. I think i better pull my finger out i have a lot of catching up to do!
    Please tell me that after all this work you are making a decent profit betting from your model?

    Reply
  2. Pingback: Expectativa de goles | Informe de Fútbol

  3. Pingback: Work-in-progress: Expected Goals model Argentine Primera División – Rational Football

  4. Max

    It’s a mighty good idea to highlight the ExpG with actual goals.
    I am absolutely convinced that ExpG is a number for the future of soccer analytics.
    Thanks for sharing 🙂

    Reply
  5. Pingback: Drogreden van de Week: Persoonlijke aanvallen in de NBA – De Denkfout

  6. abcd

    Mr 11tegen11, could you link a website, where all of the Opta stats you mentioned, including vertical speed of attack, are available? And how did you get the data? Did you have to assess every single shot on your own or was the data already aggregated?

    Reply
  7. Pingback: Adjusted SCoRe Test | debezigebijblog

  8. Pingback: Expected Goals voor dummies

  9. Pingback: How good are our xG models? – Mackay Analytics

Leave a Reply