An early look at performances in the Premier League

With the international break almost over – thank goodness, fans of Dutch national football – this is part one of an intended small miniseries looking at how teams in the top leagues of the world have started the season. I plan to look at several leagues this week by using the same format each time, starting with the English Premier League.

Most leagues have some seven or eight match rounds played, so we may expect key performance indicators like Expected Goals (ExpG) to have settled a fair bit. Generally, the R-square for ExpG’s after seven matches to ExpG’s after a full season tends to be around 0.72, so the relation is quite strong. This means it does make sense to look at current ExpG’s and try to spot patterns, as well as make some ExpG based predictions.

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix English Premier League 2014-15 14 oktober 2014The eye-catcher in this chart is Chelsea’s pink dot that illustrates their supremacy in both axes. Unfortunately for Chelsea, their dominance in the efficiency won’t hold, but their ExpG-ratio of 0.721 will still separate them from the rest.

That rest is led by Arsenal, with an excellent performance, but low general efficiency. We could expect the latter to regress a fair bit, but whether they can keep on performing with injuries to key players hurting them remains to be seen.

The next group could be termed the top-4 chasers in terms of performances. Interestingly the efficiency is spread very wide in this bunch, with Southampton riding the PDO wave a bit, and Newcastle feeling hard done. Surprise teams among this bunch are WBA and new-style West Ham, who might be candidates for a bit of regression in ExpG, based on historical performances.

United (4th in the league table) and Everton (17th) wouldn’t have thought to find themselves in the middle of the pack, and despite their widely different league positions, they occupy similar positions in terms of both performance and efficiency. More on that in the graph below.

QPR should catch up with the lower mid-table bunch with regression of their PDO, as well Burnley, although a short performance dip may easily throw them in the orange zone. A dense cluster of three teams should be alarmed by the fact that their season if fuelled by efficiency rather than performances. Don’t be surprised once Swansea (5th), Leicester (12th) and Hull (11th) start their drift down the table.

The final words are for Villa, whose return of no goals and no points from the last three matches initiated an unavoidable drop after their unsustainable season start.

 

Points per Game

The next graph is a slight variation on the previous one. The horizontal axis still presents the teams’ performance (ExpG-ratio), while the vertical axis now presents the outcome (points per game). Since this graph partly presents the same information, I’m going through it a bit quicker.

ExpG-r vs PPG by team English Premier League 2014-15 14 oktober 2014Chelsea leads by a wider gap in points than performances, confirming our conclusion above. In the rest of the table, performances and points are not so much in line, yet. The blue colours that signify good performance are spread from top to bottom. The orange zone, for troublesome performances, holds a mid-table position in the table, for now.

Were this my first assessment, I’d have cast severe doubts about ExpG-ratio as a metric. But it has proven its status before, and I’d pick ExpG over PPG any time.

 

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread for each particular team. Enjoy!

Boxplot projected league table English Premier League 2014-15 14 oktober 2014

6 thoughts on “An early look at performances in the Premier League

  1. Anthony Guinnessy

    I love this kind of analysis so thanks for sharing it.

    Is there any way of allowing for the difficulty of fixtures teams have faced so far? What I mean is does your Expg allow for the standard of opposition faced?

    Also as with Liverpool and their lack of European football last season hepping them hugely in the run in can we expect the same phenom to happen at Utd this season?

    Reply
  2. 11tegen11 Post author

    Both your points are related in a way.

    First, the negative effect of competing in Europe on the domestic campaign.
    Anyone with a sense of how football works will confirm that more matches, and more distractions will have a potential negative impact. But it’s hard to quantify that.
    Studies have been performed to look at post EL/CL matches, but that ignores the broader effect that injuries may have, that players may be rested before EL/CL matches, etc.

    Then, you’d also have to compare outcomes after EL/CL matches to a certain benchmark. That’s where your strength of schedule (SoS) remark comes into play.
    If I’d correct a team’s ExpG for the ExpG conceded by their opposition, that would effect other team’s ExpG numbers, and create and endless cycle of adaptations to ExpG numbers.
    I haven’t found a way yet that I’m fully satisfied with, to deal with SoS corrected ExpG.
    If anyone has suggestiones here, I’m listening!

    Reply
    1. dinobaggio

      Thanks for the reply, unfortunately I don’t have the time to figure out how you could adapt your data to allow for schedule so I guess we will just have to satisfy ourselves that over time the numbers will become more accurate as the SoS becomes more balanced for all teams the more fixtures they play.

      Reply
  3. Jaafa

    Could you apply this statistical analysis to the corresponding week from last years premier league and see how the results compare to the final standings?

    Reply
    1. 11tegen11 Post author

      Of course I could. And may be I will do that.

      Point beforehand is that the final league table is just one of many possible outcomes when teams of a certain strength have met. If we’d program the model so that it would generate a forecast as close to the final league table as possible, using last season’s data, we wouln’t improve this season’s predictions. That would be overfitting, leading to worse predictions for the future.

      So yes, it’s definitely possible, but in modelling terms, the final league table and the holy grail to true team strengths are two different concepts. I’m trying to get ExpG-ratio as close to that ‘true team strength’ rather than final league tables.

      Hope you get the point. Complicated matter and probably better said before by other people. I remember the parts on overfitting as one of the better parts of Nate Silver’s Signal and the Noise book, so that’s a nice read on it too.

      Reply
  4. Pingback: Advanced Statistics: Is Chelsea’s Form Sustainable? - Chelsea Index

Leave a Reply