Category Archives: English Premier League

An early look at performances in the Premier League

With the international break almost over – thank goodness, fans of Dutch national football – this is part one of an intended small miniseries looking at how teams in the top leagues of the world have started the season. I plan to look at several leagues this week by using the same format each time, starting with the English Premier League.

Most leagues have some seven or eight match rounds played, so we may expect key performance indicators like Expected Goals (ExpG) to have settled a fair bit. Generally, the R-square for ExpG’s after seven matches to ExpG’s after a full season tends to be around 0.72, so the relation is quite strong. This means it does make sense to look at current ExpG’s and try to spot patterns, as well as make some ExpG based predictions.



Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix English Premier League 2014-15 14 oktober 2014The eye-catcher in this chart is Chelsea’s pink dot that illustrates their supremacy in both axes. Unfortunately for Chelsea, their dominance in the efficiency won’t hold, but their ExpG-ratio of 0.721 will still separate them from the rest.

That rest is led by Arsenal, with an excellent performance, but low general efficiency. We could expect the latter to regress a fair bit, but whether they can keep on performing with injuries to key players hurting them remains to be seen.

The next group could be termed the top-4 chasers in terms of performances. Interestingly the efficiency is spread very wide in this bunch, with Southampton riding the PDO wave a bit, and Newcastle feeling hard done. Surprise teams among this bunch are WBA and new-style West Ham, who might be candidates for a bit of regression in ExpG, based on historical performances.

United (4th in the league table) and Everton (17th) wouldn’t have thought to find themselves in the middle of the pack, and despite their widely different league positions, they occupy similar positions in terms of both performance and efficiency. More on that in the graph below.

QPR should catch up with the lower mid-table bunch with regression of their PDO, as well Burnley, although a short performance dip may easily throw them in the orange zone. A dense cluster of three teams should be alarmed by the fact that their season if fuelled by efficiency rather than performances. Don’t be surprised once Swansea (5th), Leicester (12th) and Hull (11th) start their drift down the table.

The final words are for Villa, whose return of no goals and no points from the last three matches initiated an unavoidable drop after their unsustainable season start.


Points per Game

The next graph is a slight variation on the previous one. The horizontal axis still presents the teams’ performance (ExpG-ratio), while the vertical axis now presents the outcome (points per game). Since this graph partly presents the same information, I’m going through it a bit quicker.

ExpG-r vs PPG by team English Premier League 2014-15 14 oktober 2014Chelsea leads by a wider gap in points than performances, confirming our conclusion above. In the rest of the table, performances and points are not so much in line, yet. The blue colours that signify good performance are spread from top to bottom. The orange zone, for troublesome performances, holds a mid-table position in the table, for now.

Were this my first assessment, I’d have cast severe doubts about ExpG-ratio as a metric. But it has proven its status before, and I’d pick ExpG over PPG any time.



Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread for each particular team. Enjoy!

Boxplot projected league table English Premier League 2014-15 14 oktober 2014

How to define attacking style!?

Football analytics at the moment is a bit like a toddler. We think we can do quite a decent job, we’ve started talking quite loud with more variety in our vocabulary, and every now and then we start to make some sense too. Oh, and hey, we make people laugh at us at surprising occasions! Yet, most of the time, in hindsight our actions don’t make the most sense. And what we could do a year from now makes our current level of performance laughable at best.



Most of my earlier analytics work has been aimed at performance analysis. Which team is better? And later on, which player does better? However attractive this edge of using stats is, in an environment as highly driven by random occurrences as football, this type of analysis approaches its limits quite soon. In plain English: football is quite hard to predict.


Just a level below predicting, is describing. And a recent promising development on the describing front has been introduced by fellow blogger and analyst Michael Caley. It may well be the describing part where football analytics could win over more souls to support our belief that numbers can add to a better understanding of the game.

Could you to tell me in a few words how your favorite team prefers to attack? Chances are that you’d use words like ‘direct’, ‘patient’, ‘flank play’, ‘through balls’ and ‘crosses’. Now, what Michael has come up with is a simple and easy to use stat to express two key elements of attacking play: pace and style.



Pace is expressed as the number of completed passes per shot taken. Just use raw numbers per team, no complicated formula’s. Here’s what we come up with for the most patient teams in Europe’s top-5 leagues plus the Eredivisie.

Passes per shot - top 10 - Multiple Leagues 05 juni 2014Some of the usual suspects, like Swansea, PSG, Arsenal, Bayern and Barcelona, make this top 10, but the most patient team in Europe are Borussia Mönchengladbach with some 37 passes per shot taken. I haven’t seen them play myself this season, but perhaps some Bundesliga fans are willing to comment here.

The other end of the spectrum will reveal teams playing lightning quick football, preferring to shoot rather than pass around.

Passes per shot - bottom 10 - Multiple Leagues 05 juni 2014That’s interesting! The top four teams are all Eredivisie teams, a league known for high scoring and high shot numbers. At some distance from the rest, relegated side N.E.C. are identified as the most direct team in Europe.

Pace is a descriptive thing, not a performance marker though. Other teams from this top 10 (Levante, Augsburg, Heerenveen) have had decent to good seasons with a very direct style of play.



The second aspect I take from Michael is style of attack. Using two contrasting key elements of constructing offensive schemes, crosses and through balls, we can compute a simple ratio that proves to spread out nicely across different teams. Also, it fits well with the style of play we’ve familiarized ourselves with for certain teams. Here’s the top 10 in terms of the ‘crosses to through balls ratio’.

Crosses per through ball - top 10 - Multiple Leagues 05 juni 2014Four French teams in the top 6, but the EPL is also nicely represented. Manchester United’s Moyesball indeed makes the top 10 for crossing heavy offensive schemes, but to my surprise Mourinho’s Chelsea is not far off!

One thing: I’ve stripped out NAC, as they simply won’t play any through balls and their ratio is so off the chart that the other teams are dwarfed by it. In time a case study to NAC and manager Gudelj should follow.

In the bottom 10 we find the teams that prefer through balls over crosses. It seems a ratio of around 3 is as low as it gets, and with around 4 you’re still very much a through ball oriented team.

Crosses per through ball - bottom 10 - Multiple Leagues 05 juni 2014

Barcelona are the masters of avoiding crosses and poking central passes into the box. But would you have guessed Newcastle are so through ball heavy? And look at Heerenveen, showing up as a very direct teams just above, and avoiding crosses at the same time!


Pace and Style

Things get even more interesting when we combine both of these metrics in one chart. Teams should broadly fall into one of four categories.

–          Patient and central

o   Barcelona, Mönchengladbach, Roma, PSG, Swansea, Arsenal, Bayern, Toulouse and Ajax


–          Patient and wide

o   Nice, Rennes, Manchester United and Bordeaux


–          Direct and wide

o   Bologna, Sochaux, Lazio and Saint Etienne


–          Direct and central

o   Heerenveen, Newcastle, Real Madrid, Sevilla and Dortmund


In the end

There’s no single preferred mode of attack, and patient is not necessarily better or worse than direct. Also, central doesn’t beat wide. There are multiple ways to construct good offense and the players at hand, the philosophy of the club and the level of execution of the style if perhaps much more important.

But these concepts hand us a tool to describe pace and style, to follow trends within clubs and managerial careers. All of that with a simple tool, brought to you by the bright mind of Michael Caley.

To close off this post, here is a mega chart picturing all teams from the top 5 leagues plus the Eredivisie. Do click on it for the full, downloadable version, and you’ll see that the names above are all taken from the four corners of this chart.

Directness and Team Style - Offense version - multiple leagues

Predictions for the English Premier League – A midweek title shift

This will be a rather short post where I’ll run the numbers for my league prediction model again. Most of the workings behind the model are explained in detail in the introductory post, back when the model still held Arsenal in marginally higher regard than City. Oh, wait, that was actually only just over two weeks ago.



“How can someone reasonably have thought that Arsenal was going to win the title? I just knew it was always going to be City. Any decent football watcher could see that. All those models are just crap” (anonymous fictional reply)

Eeuhm, no. This is probably the most frustrating part about going public with predictions in football. You will always be wrong at some point. It’s just the unpredictable nature of the sport. And I could take knowledge of the past two weeks out of the data, re-run the model and confirm that, based on all information at that very point, the model rated Arsenal and City very close. I can’t do that with any human mind.

It’s a form of bias that influences our memory, so that we think we’ve always rated City higher than Arsenal. But if results would have taken another turn, we may just have focused more on that brilliant Özil stuff and Giroud finally picking up on his finishing. Once again, we would have confirm what “we [would] have already known for a long time”.



This is exactly the reason why I like to go public with these models from time to time. Let me just put the results of the model out there and see what happens. How do the odds shift upon certain events. In hindsight, we can talk openly about when decisive trends were picked up and why certain teams were over or underrated. That way we can learn, I can learn, and next year, the model will have learned. But if you think ‘I knew it all along’, please just put it out there before events take place and we’ll see. The more models and estimations out there, the more we all learn.



So, with this ramble over and done with, here we go with the predictions for the league table. The format may start to look familiar now. Boxes correspond to a spread of 50% of the outcomes of simulations around the mean, indicated by the think vertical black line. The other edges mark the 95% interval and dots are true outliers.

The outliers teach us that in extremely unlucky cases a team like Liverpool may even finish below 60 points (they have 46 already, they have Suarez and they have 16 matches left to play), with the same underlying performance they show now. Guess we’ll have a hard time convincing the conservative and trigger happy football world to accept just that, don’t we?

Boxplot projected league table English Premier League 2013-14 30 januari 2014Unsurprisingly, City lead the way after their crushing of Tottenham last night. The model has City finishing around 83 to 84 points, with a margin of just over four points to Arsenal. Both Arsenal and Chelsea have cooled off a bit, after their draws. In all likelihood, Liverpool will finish no lower than fourth and the reds may still hope for more.


No battle

Spurs are quite unaffected by the loss, since they had quite a margin to Everton, who lost to Liverpool, and to United, who had still some ground to make up from the start of the season. I’m sorry to disappoint the crowd of football journalists, but the battle for top-4 is just not happening. No team that is presently outside the top 4 holds more than 10% chance of finishing inside that top 4.

Everton and United are by now quite equal, and both have about a one in five chance of making the Europa League. Newcastle, Southampton, Villa and West Brom should probably already be thinking about next season.



The relegation battle has seen some interesting developments. The most important match was of course Sunderland’s narrow home win over Stoke, which sees the Black Cats reduce their odds to below 50%. Things look pretty dreadful for mr. Tan and mr. Solksjaer, who hold the bottom spot and the model thinks quite firmly that they will go down.

Fulham’s underlying numbers are quite terrible and this fuels the model to give them a 4 out of five chance of relegation. I’m talking most shots and ExpG conceded and 17th in shots and ExpG for, while most of their better production came in the stints against Palace and Villa when they were already two goals up.

I do realize, however, that both teams have new managers, and it’d be interesting to see if this will correspond to a shift in underlying numbers. Obviously, the model will need a bit of time to pick that up, as it also did with Palace under Pulis. But in all honesty, “the firing of the manager has to be explained in relation to other reasons rather than for the expected improvement in team performance”.

Boxplot projected League positions English Premier League 2013-14 30 januari 2014

Why Arsenal wins the title in my model

When you’re into football analytics, you’ve got to stick your neck out from time to time, and come up with some predictions. It’s what probably drew most people to analytics in the first place, yet at times it is a disappointing affair.


Football is to a large degree a random sport, and we’ll just going to have to accept (and appreciate!) the variability that is undeniably present in the beautiful game. When enjoying this sport as a fan I welcome the surprises and unpredictability; when trying to construct and improve my predictive model I usually despise it.

In this post, we’ll dive straight into the model’s predictions for the final standing of the English Premier League. So, without further ado, here it is.

20140110 Boxplot projected league table English Premier League 2013-14The colored boxes are the predicted points, with the black line indicating a mean predicted number of points. Outliers are indicated by the dots and the 95% confidence interval makes up the line for each team. Colors code the league winners, CL qualification, CL qualifiers qualification, EL qualification and relegation.

Yes, it’s out there. My model rates Arsenal. It even put them on top, but please don’t leave it at that and carry on reading…

I intended to go over the teams from top to bottom, but ended up writing so many words on Arsenal, that we’ll have to save the other teams for a follow-up post. This post will use Arsenal as a case study to explain to workings of the model on the fly. Once we reach the end of the piece, you’ll probably have a good feel for the ratio behind the model, which is essential if you want to appreciate what I say here.

The model

Even a casual reader will quickly note the color gold on Arsenal’s bar, indicating that they are predicted to have the highest points total after 38 matches. However, the gap with City is less than a tenth of a points, and you can note the overlap between the range of predicted points.  Knowing that City are now a point behind in the league table, this indicates that the model rates City as the stronger team, but only by an insignificant margin.

The range of predictions stems from a repeated run, in this case 10.000x, of simulations of the remaining matches. For each match, the odds of a home win, draw or away win is estimated on the basis of my Expected Goals (ExpG) model. Each team’s ExpG for and against are based on this season’s shot info, which is obtained via Squawka, and is driven by OPTA data.

The ExpG is obviously influenced by raw shot numbers, as in general shooting more is a good thing, but it also takes into account shot location, shot type and some other elements that (behind the scenes) I’ve shown to drive shot quality.

The projections

Both Arsenal and City are estimated around 78 points, with hardly anything to separate the teams. I do realize that this contrasts a bit with other models around, so it’s probably worth some words. The bookies, as well as other respected predictive models, give City a better shot, which may well be true. ExpG based models you’ll want to check here are by @ColinTrainor and @Cchappas and by @MCofA.

It’ll stay an unsettled argument which of the predictions is the better one, as neither Arsenal winning the league will support my model, nor City winning it will support others. The randomness of this sport simply dictates that both teams have a shot, but we’ll never find out the all-knowing underlying truths.

There are reasons why my model likes Arsenal so much. But it’s not that Arsenal have most shots, or even the best ExpG for (4th) or against (6th). In that respect, City dominate them, pairing the best offense with the second defense.

I’ll pause for a second and let you wander. How can an ExpG based model rate City significantly higher, both offensively and defensively, and still predict City to take just a single point more from the remaining matches?



To put it simple, Arsenal have this season been better where it matters most: at even scores. Performance at even scores determines who takes the lead, and who will be a goal behind. Subsequently, being a goal up or down influences your performances characteristics and here’s your flywheel effect.

Over all game states, Arsenal may not have been the best defense out there, but on even scores they’ve been tighter than City and Chelsea. For the sake of accessibility of this piece, I won’t throw all detailed even game state numbers at you, but Arsenal’s defense at even game state conceded just 0.75 ExpG per 90 minutes, which easily beats their rivals. The model recognizes that Arsenal conceding just six goals at even score this season – of which two in the 6-3 loss at City – could well be the result of the underlying performance at that Game State.

In this sense, Arsenal resembles Ajax’ numbers in a post I wrote for Volkskrant blog ‘De Zestien’, when PSV posted better overall shot numbers, but Ajax was still the preferred team for the 2012/13 title, which they eventually went on to claim.


Self doubt

Now, here’s a paragraph I will always cherish. I love my self doubt, and I think any sensible predictor can hardly have enough of it. So, why may the above paragraph may be untrue, and would Arsenal still need to be rated lower than their rivals?

For one, the effect at even Game State may not be a repeatable thing. Arsenal saw Aaron Ramsey convert at a rate he’s never done before, and he will never do again. Still, three of his eight goals gave Arsenal a lead  where they went on to win the game.

Then there are striker issues. Giroud shows more and more evidence that his disappointing conversion rates are his natural skill level, rather than the result of us not having enough shots to study (ref: Colin!). And Arsenal have serious injury issues up front, with Walcott missing the remainder of the season. In contrast to Giroud, Walcott has consistently shown excellent finishing ability, and his absence may well reflect in a dip in conversion for Arsenal. My team based model does not (yet) recognize individual player absences.

Yes, it’s mathematically possible to enter historical conversion rates into the model. So far, I’ve refrained from doing that, since it’s dangerous to assume that these historical rates hold any predictive value. In the case of Giroud and Walcott, yes, we now have a reasonable sample of shot to assume a statement on their conversion rates, but for most players, we just don’t know. Shot sample sizes are too low, and shots  too heterogenic in nature and opponents differ too. For all you know, aiming to catch all the signal around may allow a lot of noise to enter the model  and worsen the predictions.

In the end

So, here’s a piece that is about Arsenal, but it goes into detail about the model too. I felt this is needed, since I intend to use the model more and more, in order to benchmark teams, make predictions and keep understanding what happens.

And for the other teams, I guess we’ll walk by them one by one after the weekend. As the model has new information by then, the predictions may be a bit different…

Relative Shot Rates and PDO in the English Premier League

logo eplWith the Dutch Eredivisie taking its usual winter break, the opportunity arises to apply some of the most promising metrics in football analysis to other leagues around Europe. The driving force behind this initiative, as is true for most of the work on this site, is curiosity. In this case, it is curiosity to compare the Eredivisie to other leagues in terms of Relative Shot Rates (RSR) and PDO. But before we come to such comparisons, let’s study the findings in other leagues, starting with the most prominent league in the world, the English Premier League.

For those unaware of the terms RSR and PDO, let’s start with the latter for a short summary. A more extensive description can be found in the post on separating luck and skill, which introduced the concept of PDO on this site. The term PDO is adapted from ice hockey analysis, where it was introduced by Brian King (whose internet alias happened to be ‘PDO’) and picked up by James Grayson, who first applied it to football and has taken it further from there.

PDO is the sum of a team’s saves percentage and shot percentage, where saves percentage is the fraction of conceded shots that don’t result in a goal, and shot percentage is the fraction of shots created that results in a goal scored. For convenience, PDO is multiplied by 1000 to get rid of the decimals.

The RSR is an extension to the TSR, which stands for Total Shot Rate. A team’s TSR is computed as the fraction of shots created from the total number of shots in all matches played by the team. The RSR is a slight adaptation, which compares a team’s number of shots created and conceded with the league average against the same opposition. More details on the method behind TSR and RSR are found here.

Without further ado, here’s the EPL league table, updated with Match Day 22 results, including RSR and PDO. Remember, a high RSR signifies a relatively high ratio of shots created, and is a strong characteristic of sustainable good performance. A high PDO signifies a high ratio of shots converted and/or saved, which has proven to be a lot less sustainable over the longer term.

In general, high PDO teams are found in the top half of the table, and low PDO teams in the bottom half. The team with the highest PDO (1062) is Manchester United, mostly due to their immense conversion rate of 17.4%, which is over 50% better than the league average of 11.1%. The other exceptionally high PDO is Chelsea, but also Stoke, West Ham and Swansea punch above their weight, with PDO’s at a level that could only be sustainable by top teams. From a recent long term PDO analysis, we’ve learned that PDO’s outside of the 980-1020 zone seem unsustainable beyond the scope of a single season, while inherent differences in team quality may account for variations within this zone. So, we may expect Manchester United, Chelsea and to a lesser extent Stoke, West Ham and Swansea to drop off a bit in the remaining part of the season.

Remarkably low PDO teams are clustered near the bottom, where all of Newcastle, Aston Villa, Southampton, Wigan and QPR look set for an improvement on their points-per-game haul so far. Also, Liverpool and Tottenham rank low in terms of PDO, which means an improvement in terms of points-per-game is just around the corner.



In terms of RSR, there is quite a clear top-3, with Manchester City, Liverpool and Tottenham the only teams above the 0.600 mark. This means that, with hypothetical equal conversion rates, these three teams would be in a close fight for the title. And, since shot rates are a lot more sustainable on the long term than conversion rates and saves rates, these three teams reflect the best underlying performance level. Behind them, Everton and Arsenal are a close 4th and 5th, with Chelsea at 6th place. Perhaps remarkably, league leaders Manchester United come in just 7th in terms of RSR. This indicates that Sir Alex Ferguson’s team is highly reliant on substantially higher conversion and/or saves rates, which seems a precarious base for future success. However, so far, their exceptional PDO has earned them a gracious seven point lead over rivals Manchester City, which may well be enough to win the league.

Based on these parameters, the top-3 will most likely be United, City and Tottenham, with a close battle for fourth between Chelsea, Everton, Arsenal and Liverpool.

A the bottom of the table, both RSR and PDO spell doom for recently promoted Reading. They are the bottom team in terms of RSR, and by a distance, but their PDO of 1017 indicates that their shots and/or saves percentage has been above the average EPL level, which is more than can realistically be expected of this side. A PDO at the low side of the 980-1020 zone seems more realistic and a disconnection with the pack battling for survival seems imminent for Reading.