Heracles and the failed art of shot blocking

One team in particular has been playing really out of sorts in the Eredivisie so far. Poor little Heracles have noted a 1-0-8 record to open the season with, and therefore occupy the bottom spot of the table with a poor three points from nine played. Still, in analytics terms, something makes them very interesting, and I believe now is the time to share this observation, so that we can follow it over the coming months.

HeraclesIf you’re not a die-hard Eredivisie fan, you may not know all that much about Heracles, so let me tell you something about them. They are a genuinely small team with strong local support, but as a well-run business they’ve been a stable Eredivisie side for ten years now. With your classical education background, you’d probably already noted their cool name, referring to a divine hero in ancient Greek mythology. Our hero was noted for his extreme strength and courage, something that can hardly be more out of place in reference the performance of Heracles the football team this season.

As a first exploration for that disastrous 1-0-8 season opening, I’d probably look at some shot numbers, expressed as per match.

Shots for                             11.6

Shots against                     11.9

TSR                                       0.494

Well, that’s weird. Apparently Heracles have a near balance in shots created and conceded, yet noted that 1-0-8 record. Bad luck, or something to do with shot quality?

ExpG per shot for             0.100

ExpG against                     0.125

Mmm, that’s an ugly picture. Heracles create shots that fly in in a 1 in 10 rate, and concede shots that usually convert at 1 in 8 rate. For years they have been fooling TSR with this behaviour, leading to overestimation of their strength in a metric that values each shot equally. This means that simply combining shots numbers and shot quality should provide part of the answer already.

ExpG created                    1.16

ExpG conceded                1.48

ExpG-ratio                         0.439

So, this metric should do it. But wait, 0.439 isn’t good at all, but it is far from in line with that 1-0-8 record. Usually, 0.439 teams record around 1.2 points per game, so something like a 3-2-4 record would be a more fitting reward for their play in terms of ExpG-ratio.

The answer, unsurprisingly for readers who remembered the title of this piece, lies in shot blocking.

Shots blocked offense                                 27.6%

Shots blocked defence                                10.2%

ExpG of unblocked shots created              0.91

ExpG if unblocked shots conceded           1.36

ExpG-ratio unblocked                                  0.401

An average 2014-15 Eredivisie team blocks around 19.2% of shots. Heracles’ offense sees around 50% more of its shots being blocked by opposing defenders. In return, Heracles’ defence blocks shots at a rate that’s 50% lower than their rival teams do. Their unblocked ExpG-ratio is a poor 0.401, which, combined with some tough luck goes a long way explaining the horrendous season so far.

Make of it what you will. Heracles might be very poorly organised from a tactical standpoint, producing below-average quality shots that are 50% more likely to get blocked, while the reverse is true for their offense. Those are some painful numbers that illustrate aspects that TSR won’t grasp.

But it’s a bit more nuanced than that. Using TSR to explain what has happened is always going to lose to measures like ExpG-R that take in more variables to correct for relevant details of performance, like shot quality, and in the case of unblocked ExpG-R also block rates. But do those aspects carry over from historic data to future performance, or are they mere variations that tend to even themselves out over time?

For comparison of the repeatability of TSR and ExpG-R, I’d refer you to earlier work on this site.

For the art of shot blocking, I haven’t shown data before. Here’s a simple scatter plot of the percentage of blocked shots that teams have noted in two consecutive seasons. That dataset here is EPL, La Liga, Bundesliga, Serie A, Ligue 1, MLS and Brazil 2012-13 and 2013-14.

Block rate - defensive - all shots Block rate - offensive - all shotsA few things are of note there.

  1. The relation between the rate of shots blocked in consecutive seasons is not particularly strong, so most of it is probably variance.
  2. Teams don’t note a shot blocking rate below 16%. This mean Heracles are either going to set a world record of poor shot blocking, or they’ll get picked up by regression any time soon.
  3. The defensive aspect of shot blocking is a tiny bit more repeatable than the offensive side, i.e. avoiding your shot getting blocked.

The problem with this raw analysis of blocks is that not all shots are blocked at the same rate. Here are the block rates for different types of play in the Eredivisie (2013-14 and 2014-15 data).

Direct FK                             24.7%

Open play shots               23.5%

First time attempt            22.6%

Rebounds                          22.3%

Indirect FK                         18.9%

Corners                             17.6%

Open play headers          6.5%

The distribution of shots from various situations may be different from team to team, thereby producing bias in the block rates. Some teams may concede more headers than others, which would make them look like poor shot blockers, since headers are rarely blocked.

To cut a long story short, of all different types of play mentioned above, only open play shots show any degree of repeatability in terms of the block rate. For all other types of play the correlation from season to season is virtually non-existent.

Block rate - defensive - open play shots only Block rate - offensive - open play shots onlyInterestingly, the R2 values decrease sharply when doing the repeatability test for block rates, leaving only open play shots to show any degree of repeatability, however small.

The fact that the overall analysis shows a stronger correlation than the subgroup analysis suggests that a big part of the repeatability of block rates of overall shots should in fact be bias introduced by the fact that teams show different distributions in types of play, rather than in actual block rate.

 

In the end

What can we, and our poor Heracles, take from this in-depth analysis of block rates? Simply put, they should be ignored when trying to predict future performances. The repeatability of block rates is very poor in general, and even smaller when performing subgroup analysis for different types of play. If your team suffers from very poor block rates, like Heracles, chances are this will regresses. Whether this means that block rates represent luck that events itself out, or managers that identify issues they get fixed rather efficiently, is impossible to tell at this moment.

For Heracles, this brings a touch of optimism, as their horrible block rates, both offensively and defensively, are expected to regress over the season. This should bring their actual outcome closer to their ExpG-ratio of 0.439, for a 1.2 points-per-game pace. This still means they are expected to add just 30 points (25 games to be played * 1.2 PPG) to their current total of 3 points. Relegation territory, but they should be within touch of the pack, rather than trailing miserably as they do now. We will be watching closely!

An early look at performances in La Liga

For the fourth and final part of our miniseries, attention shifts to Spain. Will anyone be able to put up resistance to the picture perfect season opening by Barcelona? Is Valencia really this season’s surprise package? And will Sociedad recover from their disastrous opening?

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix La Liga 2014-15 16 oktober 2014Yes, it’s Barcelona, a bit of nothingness, more nothing, and then the rest. An out of this world ExpG-ratio of 0.799 combined with an extreme PDO wave of over 1175 resulted in six wins and a draw, no goals conceded, and La Liga’s title all but clinched. The PDO will resolve, points will be dropped, but hey, no one really looks like catching Barcelona here.

A close bunch of five teams competes for the honours of second place, as it seems. The expected names of Real, Sevilla and Atletico are there, but Celta seem to do very well in just their second season after promotion, as well as those other boys from Barcelona, Espanyol.

In analytics terms, Valencia are an outlier of note. Their PDO has been even higher than Barcelona’s, riding them to 2nd place in the table. Point is, and ExpG-ratio of 0.493 is not going to take them far, once this PDO wave runs out of steam. Obviously, the 17 points from seven matches will boost their final standings, but on wouldn’t really expected them to threaten the top three.

Orange is trouble in the Good-Lucky matrix, so Granada, Córdoba and Levante catch the negative light here. Bilbao will improve, once their PDO pulls towards the red line, but are still way below mid-table.

 

Points per Game

If you’re performance is as elite as Barcelona, you won’t drop many points. Their 0.799 ExpG-ratio simply means they are on average four times more likely to scores than their opponents. Hard to see them losing more than a handful over the season then.

As expected, Valencia are flying high, but don’t have the performance levels to back it up, as does Granada, though at another level. Espanyol will pull up over the coming weeks, as will Sociedad, and Deportivo to an extent.

ExpG-r vs PPG by team La Liga 2014-15 16 oktober 2014

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table La Liga 2014-15 16 oktober 2014

An early look at performances in the Serie A

In the third part of our miniseries, the focus shifts to Italy for the Serie A. After failing to drop a single point at home last season, Juventus currently haven’t dropped a single point in any of their six matches. Like in the Bundesliga, the title run might not be that contested, but behind the boys from Turin, all sorts of excitement and unpredictability arises.

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix Serie A 2014-15 14 oktober 2014Like Bayern in Germany and Chelsea in England, Juventus dominate the Good-Lucky graph, but mostly so on the ExpG axis, which is a good thing for them. With an ExpG-ratio of 0.767 they set an unprecedented dominance.

Behind them is an interesting group of eight blue-ish teams, who seems clearly separated from the rest of the bunch. I write interesting, because apart from Roma and Sampdoria, this bunch had been more or less unfavourable in PDO. The best bet for PDO issues to resolves, and that means we should expect the like of Napoli (7th in the table) and Lazio (8th) to put up a good chase of 2nd placed Roma and 3rd placed Sampdoria, whose outcome seems partly PDO fueled.

Although Milan (5th in the table) would be deemed in less deep trouble than their rivals Inter (10th), it seems to be a matter of time before the ‘nerazzuri’ will catch up with the ‘rossonero’.

Down the ExpG axes it’s Chievo who find themselves in most trouble, with early season surprise package Udinese (4th in the table) the most likely candidates for a winter depression. Sassuolo, Parma and Palermo illustrate the fact that PDO rules early in the season, as these bottom three in PDO terms are also bottom three in the table with 3 points from 6 matches.

 

Points per Game

In the Serie A, just like in the Bundesliga, the connection between performance and points is quite direct. Holding a perfect 3 points per game spot high on top are Juventus, while this graph illustrate future drops and rises.

It’s quite easy to spot what’s going to happen at Udinese and Verona, who hold more points that their performance so far justifies. The reverse is true for Cagliari, and in decreasing order Parma, Palermo and Sassuolo.

ExpG-r vs PPG by team Serie A 2014-15 14 oktober 2014

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table Serie A 2014-15 14 oktober 2014

An early look at performances in the Bundesliga

In the second part of our miniseries, the focus shifts from the English Premier League to the Bundesliga. Bayern Munich have hardly been contested for the past two seasons, and already hold a ten point lead over Klopp’s Dortmund, but how do things look beneath the surface of the league table? Will Hamburger SV and Werder recoup their disastrous season starts? And Paderborn hipsters, anyone?

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix Bundesliga 2014-15 14 oktober 2014Dominating this matrix in both performance and efficiency, no surprise, is indeed the boys from München. Although they may regress a little bit in PDO, their solid 0.703 ExpG-ratio is undisputable league winning form.

Best of the rest are Leverkusen and Dortmund, although neither has been able to show that result-wise, due to extreme PDO depressions. This will revert, but in order to keep the trace of leaders Bayern, they will need and extreme PDO dip for their rivals to occur with unlikely PDO waves themselves. Not gonna happen.

Good performances are noted by Wolfsburg, Gladbach and Freiburg. Yes, 15th ranked Freiburg that is. They won’t likely make a run for the CL spots, but with any good PDO wave through their season, they may just knick one of the EL spots, if their current performance holds up. More on that later.

Disappointments are mid-table Schalke, who don’t look like moving up soon and Hoffenheim, whose early season run seems fuelled by an efficiency that can’t hold. Hamburg and Werder may be the bottom teams in the table, but are not in serious relegation form in ExpG terms.

That orange zone of relegation form, holds two teams that have been bailed out by an early season PDO wave – Köln and most notably Paderborn – and 16th ranked Stuttgart.

 

Points per Game

The Bundesliga already displays quite a direct connection between performance and outcome, indicated by the steep regression line. As to be expected given their underlying performance, Bayern has already distanced itself from the pack, with a nice trailing group of that will compete for 2nd to 7th place, as it seems. The models prefers Leverkusen for now, but with a very small margin, and it’s still early days.

ExpG-r vs PPG by team Bundesliga 2014-15 14 oktober 2014Werder, Hamburg and Freiburg all hold quite low positions, and it’s easy to see them catching up as more matches will be played, and teams will tend to move towards the red line. The interesting case study here is Freiburg, whose ExpG-ratio is displayed as an amazing 0.554! Yet their prediction below is a sober bottom spot with just around 31 points. How come?

In the predictions, teams are evaluated according to their non-blocked non-penalty non-rebound shots. Freiburg has one of the highest percentages of blocked shots (30%) and one of the lowest percentage of shots blocked by their own defense (18%). That is not a good thing, and it seriously hurts their prediction. Furthermore, they have already been awarded 3 penalties, which means they’d need some 15 penalties to keep this pace until the end of the season. Not gonna happen, and the model knows that.

 

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table Bundesliga 2014-15 14 oktober 2014

An early look at performances in the Premier League

With the international break almost over – thank goodness, fans of Dutch national football – this is part one of an intended small miniseries looking at how teams in the top leagues of the world have started the season. I plan to look at several leagues this week by using the same format each time, starting with the English Premier League.

Most leagues have some seven or eight match rounds played, so we may expect key performance indicators like Expected Goals (ExpG) to have settled a fair bit. Generally, the R-square for ExpG’s after seven matches to ExpG’s after a full season tends to be around 0.72, so the relation is quite strong. This means it does make sense to look at current ExpG’s and try to spot patterns, as well as make some ExpG based predictions.

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix English Premier League 2014-15 14 oktober 2014The eye-catcher in this chart is Chelsea’s pink dot that illustrates their supremacy in both axes. Unfortunately for Chelsea, their dominance in the efficiency won’t hold, but their ExpG-ratio of 0.721 will still separate them from the rest.

That rest is led by Arsenal, with an excellent performance, but low general efficiency. We could expect the latter to regress a fair bit, but whether they can keep on performing with injuries to key players hurting them remains to be seen.

The next group could be termed the top-4 chasers in terms of performances. Interestingly the efficiency is spread very wide in this bunch, with Southampton riding the PDO wave a bit, and Newcastle feeling hard done. Surprise teams among this bunch are WBA and new-style West Ham, who might be candidates for a bit of regression in ExpG, based on historical performances.

United (4th in the league table) and Everton (17th) wouldn’t have thought to find themselves in the middle of the pack, and despite their widely different league positions, they occupy similar positions in terms of both performance and efficiency. More on that in the graph below.

QPR should catch up with the lower mid-table bunch with regression of their PDO, as well Burnley, although a short performance dip may easily throw them in the orange zone. A dense cluster of three teams should be alarmed by the fact that their season if fuelled by efficiency rather than performances. Don’t be surprised once Swansea (5th), Leicester (12th) and Hull (11th) start their drift down the table.

The final words are for Villa, whose return of no goals and no points from the last three matches initiated an unavoidable drop after their unsustainable season start.

 

Points per Game

The next graph is a slight variation on the previous one. The horizontal axis still presents the teams’ performance (ExpG-ratio), while the vertical axis now presents the outcome (points per game). Since this graph partly presents the same information, I’m going through it a bit quicker.

ExpG-r vs PPG by team English Premier League 2014-15 14 oktober 2014Chelsea leads by a wider gap in points than performances, confirming our conclusion above. In the rest of the table, performances and points are not so much in line, yet. The blue colours that signify good performance are spread from top to bottom. The orange zone, for troublesome performances, holds a mid-table position in the table, for now.

Were this my first assessment, I’d have cast severe doubts about ExpG-ratio as a metric. But it has proven its status before, and I’d pick ExpG over PPG any time.

 

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread for each particular team. Enjoy!

Boxplot projected league table English Premier League 2014-15 14 oktober 2014

The Good/Lucky Matrix

It’s early October, and the league tables around Europe are starting to shape up. If you want to see how your team’s doing, it is tempting to check the league table, but you may well fool yourself into an opinion by doing so. With just over a handful of games played, league tables tend to lie. So, how can we do better without overcomplicating things?

Here’s where the Good/Lucky Matrix comes in. The brilliant Benjamin Pugsley released this very fitting name for a plot with a straightforward design. The Good/Lucky Matrix depicts exactly the type of information we are looking for, without additional fancy complicated stuff. In fact, it is a sublime graphical representation of the concepts that have shaped football analytics here and elsewhere over the past years, shot ratio and PDO, separating skill and luck.

Good - Lucky Matrix Eredivisie 2014-15 05 oktober 2014

The Good/Lucky Matrix consists of two simple, yet crucial elements: ExpG ratio and PDO.

 

Good

The horizontal bar presents how good teams have performed to date. Ben prefers the ‘Shots on Target Ratio’(SoTR), but to best evaluate team performance, I prefer the Expected Goals Ratio. The method behind my ExpG formula is explained here. In return for the added complexity that ExpG has over a simple shots count like SoTR, it adds more detailed shot information and a better appreciation of shot quality. It’s a matter of taste, but if you have an ExpG at hand, then why not use it?

 

Lucky

The vertical bar represent an acronym called ‘PDO’. Most readers will probably be familiar with PDO, but for those who are not, it’s a simple addition of a team’s save percentage and scoring percentage. The league average PDO will always be 1000, since one team’s goal is another team’s goal conceded.

As a rule of thumb, the best teams in a league will have a PDO around 1020, while the worst teams don’t drop below 980 in the long-term. In other words, PDO’s outside that zone indicate under- or over performance that won’t hold up long-term. For practical reasons, we shall call this luck, and for now skip the philosophical debate whether over performance is indeed luck or not.

The red line in the Good-Lucky Matrix indicates a roughly normal PDO for a given performance. In the present Matrix it is in fact the regression line between ExpG ratio and PDO.

 

Same PDO, different luck

Please take a look at the Matrix and locate, from left to right, Go Ahead Eagles (ExpG-R 0.285 ; PDO 987), AZ (ExpG-R 0.527 ; PDO 988) and Feyenoord (ExpG-R 0.752 ; PDO 1001). Here are three teams with vastly different performances: very poor, upper mid-table and elite. A simple look at the PDO would say they are all well within the 980-1020 zone where we would assume they have neither been lucky, nor unlucky.

But, based on the correlation between performance and PDO, I would say that Go Ahead Eagles have been a bit lucky, AZ a slight bit unlucky, and Feyenoord quite unlucky so far.

 

The extremes

On the extreme sides of the PDO axis are Heracles (unlucky) and PSV (lucky).

Heracles, who have just won their first game this weekend after an 0 for 7 start, were never as bad as their start to this season indicated. Their results seem mainly driven by an extremely low PDO (872) that will soon find its way to a more sustainable zone. Heracles’ ExpG ratio of 0.445 on the low end of the mid table bunch of the league, and if their performance stays like it is, it is to be expected that their league table position will reflect that in time.

PSV, who lead the league table with 18 points from 8 matches, should be happy and worried at the same time. Happy that they won over two points per match while two teams with better underlying performance (Feyenoord and Vitesse) trail them by 7 points already. Worried, that their underlying performance does not indicate title winning form, which generally requires an ExpG ratio over 0.650.

 

In the end

The Good/Lucky Matrix, with all credit to Benjamin Pugsley, will make frequent appearances here, if I don’t find the time for extensive pieces, but feel the need for a quick analytical glance. For me, it’s a perfect tool to grasp the actual state of teams.

Why I don’t board the PSV bandwagon just yet

On a first glance, things are looking all rosy in Eindhoven. Going into the season, PSV were hoping to challenge Ajax for the title, but four matches into the season it’s been all good news for PSV. They beat Ajax away and hold a six points lead already. Still, I would be hesitant to board the PSV bandwagon just yet.

PSVYes, PSV won four out of four to equal their best league start in 11 years.

Yes, PSV managed to hang on to arguably two of the league’s best players in both Memphis Depay and Georginio Wijnaldum.

Yes, PSV strengthened their squad by re-signing central defender Karim Rekik on a one-year loan from Manchester City, and experienced Mexican international Andres Guardado.

There must be some very compelling arguments not to board the PSV bandwagon right now and start declaring them red hot favourites for the 2014/15 Eredivisie title.

 

Scoreboard journalism

Co Adriaanse

Well, the main argument is called scoreboard journalism. Back in 2003, this termed was coined by then AZ manager and now prominent TV pundit, Co Adriaanse. He pointed out that, although his team had just lost 5-1 to Roda JC, the play had been quite good, and the pundits judged outcome over process.

In reverse, the same holds true for PSV so far this season. With twelve points from four matches, the outcome has been perfect, yet the process is worrying to say the least. It’s probably easier to fit narratives to PSV’s perfect start, than it is to dive into the underlying numbers and write about the process at hand.

And even if you are smart, but you see your job as filling newspaper space or talk show time, talking PSV up now ensures new stories to write once the current bubble will inevitably burst. There will probably be a player missing through injury, or post Europa League matches, may be even early kick off times to blame. There will be new narratives to fit, new stories to write, everybody happy.

But here at 11tegen11 we don’t have to worry about narratives, and we’re free to take a dive into our beloved stats for a more nuanced opinion.

 

Shots

PSV has played four matches, scored 14 goals, and conceded three. In those four matches, PSV didn’t win the shot count once. Not in their season opener at promoted side Willem II, not away at Ajax when they put a dent in their rival’s early season, not last week beating Vitesse 2-0 at home, and not even in their 6-1 thumping of NAC Breda.

In each of those matches, PSV produces less shots than their opponents. Now, some people would be convinced that this is a good thing. ‘Winning the matches where you play poorly is a sign of champions.’ There’s a lot to say about that statement, but losing the shot count four out of four times is always a bad sign. Shot counts are very well correlated with end of season points, even this early in the season.

 

ExpG

Other people would argue that not all shots are equal, and that’s a good point.

PSV has produced shots of higher quality than the shots they have conceded. The average PSV shot this season has an ExpG of 0.131 (5th in the league), while the average shot PSV conceded has an ExpG of 0.094 (3rd in the league). This reflects their present philosophy to try and contain their opponents, and take advantage of quick counter attacks.

Despite a negative shot count, their ExpG count is positive. In four matches, PSV produced 8.8 ExpG and conceded 6.7. Reality check: scoring 14 from 8.8 ExpG won’t last, as will conceding just 3 from 6.7 ExpG.

PSV’s ExpG ratio of 0.549 (8.8 / 8.8 + 6.7) is okay at best, but for a serious title challenge a ratio of 0.625 is a minimum.

 

Depay

Finally, people will argue PSV that have Memphis Depay. He scored five goals already. His finishing alone can help PSV overcome opponents even without producing more shots, or generating more ExpG. Well, it’s definitely true that the eye-test suggests that Depay is the most skilful finisher in the league. Still, scoring more goals than ExpG suggest is hardly a basis for future success. And, for what it’s worth, Depay scored 3.3 goals less than his ExpG of last year suggested. In all likelihood this won’t carry over, but so much for that supposedly superior finishing skill.

 

In the end

Still standing on board that PSV bandwagon? You may be correct, and I may be wrong. PSV may improve as the season goes on. Football is unpredictable in exact terms. But broadly speaking, PSV will either need to improve big time in their underlying play, or the wheels will quickly start to come off, and you may need to look for another bandwagon before the next international break.

Hint: it may well be red and black, as the discrepancy in Feyenoord’s outcome and process goes exactly the other way.

Turn on the scouting radar! 

We live in fortunate times. As football fans we’ve got all sorts of information about our stars at just a mouse click away. Any moment in any day can be filled with watching football, reading about football, or checking football stats.

 

Panini

How different were things we things when I fell in love with football. In the summer of ’86, when I was nearly eight years old, I studied players’ clubs, birth dates and positions, simply because that was all my Panini album had to offer. I tricked my parents into letting me watch some first halves, despite kick-off times far beyond my usual bedtime.

$T2eC16hHJGYE9nookPZnBQhRQbKBPw~~60_57

What I lacked in information, I compensated in fantasy. I created my own truth about Andoni Zubizarreta, the Spanish goalkeeper with that magnificent surname and that fascinating look in his eyes. And about Diego Armando Maradona, of whom I knew little else than Lanus, 30-10-1960, Napoli.

 

Overload

Some thirty years later, things are so very different. My constant hunger that made even the most basic stats taste good, has been traded for a stats overload that makes it hard to get a sense of what’s really going on.

In this age of information overload, value lies in cropping data to bite-size proportions, without losing its relevance. In assessing a football player, one might be able to scan through some individual stats like non penalty goals, key passes, dribbles and dispossessions. But after three players, I’m kind of full.

Comparing a league full of players just by looking at stats is an impossible task for the human mind. However, comparing different shapes is a task we are – by evolution – much better at. So, back in January, when I saw Ted Knutson’s magnificent work on player radars, the fun in individual player stats was back, immediately!

Looking at shapes, one can easily get a feel for strikers that offer great link-up play, or midfielders that offer little else besides defensive protection. The radars offer a fantastic link between player traits and cold stats.

 

Upgrade

Just like the ExpG model, the player radars on 11tegen11 have had a huge summer upgrade. I’ve decided to apply nearly the same format that Ted uses on StatsBomb. The reason behind this is quite simple: I think radars have a huge potential in opening up the stats world to a big audience. When each analyst uses their own radar versions, wider adaptation will be slowed down a lot. We shouldn’t niggle about subtle differences when it’s clearly better to just step over those details and show the world what we can do.

Like Ted described in one of his introductory pieces on StatsBomb, there are different templates for different positions.

–       AM/FWD for strikers, wide attackers in front threes, and the three men band in a 4-2-3-1.

–       CM/DM for central and defensive midfielders

–       Fullback for eehhh… fullbacks.

Goalkeepers and central defenders don’t have templates yet, since we don’t exactly know what stats to judge them by.

The outside boundaries of the chart represent 95% percentiles to prevent players like Messi and Ronaldo from dwarving the rest. The inside boundaries, likewise, represent the 5% percentiles. Negative axes, like fouls, are inverted so that bigger coloured areas are always indicative of better performances. As a reference database, to compile the axis limits, I’ve used the 2012/13 and 2013/14 seasons of the top-5 leagues (EPL, Bundesliga, La Liga, Serie A and Ligue 1).

 

Radar

For a nice example of the CM/DM template, meet midfielder Kamohelo Mokotjo, recently transferred to Twente, but leading PEC Zwolle to last season’s Cup victory.

Radar chart - @mixedknuts version - Kamohelo Mokotjo - Eredivisie 2013-14

Mokotjo tore the Eredivisie apart last season. Without reaching the 95% mark in any category, he scored high on nearly every axis, while playing nearly almost three quarters of all possible minutes. If we would scout for spectacular stats in certain categories, chances are we’d miss Mokotjo. There’s not one thing he does so good that he reaches the outer boundary that is the 95% percentile. It’s the combination of doing everything very good that makes him a fantastic player.

 

Scouting

It’s pretty obvious from this radar to see that Mokotjo had a magnificent season, but wouldn’t it be great if we could somehow quantify player radars?

Well, the good news is, we can.

I’ve computed the surface of Mokotjo’s radar and compared it to the same database that I’ve used to find the size of the axes. The surface of Mokotjo’s radar is compared to the reference database, and value is scaled on 0 to 5 stars.

To find a central midfielder with a radar surface this large is very rare, so Mokotjo is awarded the full 5 out of 5 gold stars. It’s like Football Manager’s player ranking system brought to real life. Small caveat: it’s easier to score high stats in the Eredivisie than in the EPL. One day, when we’ve learned how stats translate between leagues, we may know how to adjust for league differences.

 

Potential

Knowing how good a player’s season was is one thing, it’s another thing to know something about potential. This time, look at 17-year old new Ajax signing Richairo Zivkovic.

Radar chart - @mixedknuts version - Richairo Zivkovic - Eredivisie 2013-14

Playing for Groningen, Zivkovic had a hugely impressive debut season, for a 17 year old. His performance in front of goal was elite, his performance in terms of passing, dribbling and defensive contribution was, well, nearly absent.

In terms of radar surface, Zivkovic earned 3.5 stars, and should he add more passing to his game, or more dribbling, or some defensive work, he should rise in terms of stars. Still, for a 17-year old, this was an elite season. Now, how to put this ‘ for-a-17-year-old’ thing in our stars ranking?

Well, quite simple actually, by comparing a player to his peers, rather than to the full reference database.

The silver stars compare a player not to all other players, but only to players of the same age, or younger. There are just a handful of 17 year olds in the database, so I’ve set the lower limit to 18. When comparing Zivkovic to other forwards aged 18 years or younger, he turned in an elite performance, earning him 5 out of 5 silver stars. So, there we’ve got Football Manager’s second star rating: potential!

 

In the end

I’ve only just finished scripting these new player radars, but I still find myself playing around with them. Finally, we’ve got a tool that makes individual player statistics fun.

We’ll use them a lot this season, both on this site and on Twitter. The radars now allow for true player scouting, both in terms of actual quality, and in terms of future potential.

Expected Goals 2.0 – Some light in the black box

If football analytics was a Hollywood movie, Expected Goals would definitely be the poster boy. The influx of attention for football analytics during the recent World Cup meant a lot of attention for the concept of Expected Goals, or ExpG as its mostly referred to. With that attention came two very important questions, that I’ll try to address in this post. What is ExpG? And how do you compute it?

 

What is ExpG?

Expected Goals is assigning each goal scoring attempt a number between 0 and 1, to represent the chance that this goal scoring attempt results in an actual goal.

I use a model that I have revised completely over the summer, so this makes for a perfect time to explain the full workings of it. Expected Goals 2.0, here we go…

 

Modelling

Suppose I tell you that a football match has just finished and I ask you to estimate the number of goals for each team. You know nothing. Not the teams, not the occasion, not the shot numbers, and nothing that happened on the pitch.

You’d probably say both teams have scored around 1.4 goals, since 2.8 is a good estimate for the average number of goals per football match. Since you have absolutely no information about the match at hand, estimating this average of 1.4 goals per team should lead to the smallest difference between your estimate and the actual goals by each team.

In building a model, the difference between your estimate and the actual outcome is called the error, and you should be aiming to keep the error as small as possible.

(don’t look down at the .gif yet)

 

Shots

Now, I tell you that the match at hand had 10 shots by team A and 14 shots by team B. Would this change your estimate of 1.4 goals for each team?

Since we know that on average 1 in 9, or 11% of shots results in a goal, it would make most sense to estimate 1.1 (10 * 0.11) goals for team A and 1.54 (14 * 0.11) goals for team B.

This is your most basic expected goals model at work. In fact, it is what we’ve been doing for years, with Total Shots Rate. The total number of shots is a nice, but far from perfect, indication of the number of goals you can expect.

 

Attempts

Let’ s add some more information to our model, and for the sake of readability of this piece, I’ll give you all visual information on a single goal scoring attempt that we’ll use as an example of the current ExpG model that I use on 11tegen11.

sneijder

Here’s what the ExpG model sees.

  1. The match situation is open play

The models discriminates between seven match situations: open play, corners, direct free kicks, indirect free kicks, penalties, rebounds and first time attempts.

  1. A non- league match

This fragment, in case you hadn’t noticed originates from the Spain vs. Netherlands match at the past World Cup. For each league, different conversion rates are computed for each match situation.

  1. Game State

The score line during this attempt is 0-0, so the odds of scoring are slightly reduced. Shots at even game state are converted a bit less than shots at GS +1, or even GS -1.

  1. Shot location

The angle to the goal is 22 degrees and the distance is almost 15. Note the absence of units for distance, I don’t compute yards or meters, just an abstract number based on coordinates. In terms of modelling, it’s all about the relative difference between different goal scoring attempts, and not about getting the distance correct in absolute terms.

To compute the angle to the goal, I compute angles to both goal posts and take the difference between those two numbers. The number you get represents the view a player has on the goal. It represents how much of a 360 degree circle around the player is represented by the goal. For more lateral positions and more distance from the goal, the number goes down. I prefer this method over a simple angle to the middle of the goal, since works better for close ranges, where most shots are taken.

  1. Shot type

This is a shot, rather than a header. Given the location, this makes a huge difference in terms of ExpG.

  1. Though ball

The shot has been assisted with a through ball. This is a big plus for ExpG, since through balls generally reduce the number of defenders able to contest or block the shot.

  1. Cross

The shot has not been assisted with a cross. Crosses are bad. They have a negative influence on ExpG. It’s easy to get loads of crosses in, so in terms of trying to score goals they may be good for some teams at some times, but it’s harder to score when the goal scoring attempt comes off a cross than when the same goal scoring attempt does not come off a cross.

  1. Touches

The attacking team has taken three touches. More touches taken reduces ExpG, since (generally) defenders have more time to get in position to defend.

  1. Vertical speed

In the build-up of play, the attacking team has moved the ball forward at 2.87 per sec. Note the absence of units for distance, since this is again an abstract number based on coordinates. More important point: quicker vertical movement leads to higher ExpG.

 

Regression

None of the above items are used because I personally think they are important for ExpG measurement. They all show up as significant factors in a multivariate regression analysis that I’ve run on some 160.000 goal scoring attempts in various match situations and various leagues. Just like we tried to minimize the error in our initial two estimations in the early stages of this article, a complex regression models tries to minimize those errors for large numbers of shots and large numbers of potentially important factors for ExpG.

In the end, for open play shots, the above mentioned factors prove to be important. For different match situations, different factors are important. You can imagine that vertical speed is not important to score from corners, or that for indirect free kicks the number of touches is not important (the defense is set to defend anyway). The joy of a multivariate regression model is that it’s not up to you to decide which factors to use (and then having to defend your choice on blogs and twitter), it’s the model that advises you which factors to use and how to weigh them.

In the future, we may discover new items to measure. If the multivariate regression model then suggests them to be of significant influence on ExpG in certain match situations, they will be added for those match situations. The model is a living thing. If I can improve it, I will.

 

Defensive pressure

The most frequently heard comment on any ExpG model is probably the fact that defensive pressure is not incorporated. That’s both true, and not true, depending on how you define defensive pressure.

Since all data is based on ‘on-ball events’, we don’t have any direct information on the position of defenders and goalkeepers. In isolated cases, this can be quite frustrating. Sometimes a goalkeeper is stranded way out of position, and your model ends up underestimating the ExpG of that goal scoring attempt.

The model may not have direct information on defender and goalkeeper positioning, it does have a lot of indirect information on it. Game State, vertical speed, crosses, through ball and number of touches all carry some information about the amount of defensive pressure that is present for a goal scoring attempt. Obviously, direct information would be preferable, but even with this indirect information, for 99% of attempts we get a good sense of defensive pressure.

 

In the end

With this piece, I’ve opened up about as much as I can on the workings of my ExpG model. There is no single formula that I can give. It’s not as simple as ‘shots from this zone get 0.12, headers from that zone get 0.07′.

Each goal scoring attempt is judged on the basis of its relevant contextual information. The result is the best estimate I can create for each goal scoring attempt. Using the best contextual information can teach you so much about football, let’s have a lot of fun with it this coming season!

Why do we write?

We are busy people. Most of us are in their twenties or thirties, have demanding day jobs, and a partner or family we love to attend to. And, just for fun, a few years back we opened a blog and wrote something about football and numbers. We liked it, so we wrote some more, and kept on doing so. We coined ourselves something of an online community of football analytics bloggers, and by now we’ve been around for years.

But something is changing. Most established football analytics blogs experience a severe drop in articles over the past year or so, and 11tegen11 is no exception to that trend. We are busy with our jobs, lives and families. Football writing can wait a moment, and another moment, and another moment. To the point where I started believing my own lie that I couldn’t find the time for writing recently.

 

Busy

Life is no busier now than it was when I started writing, back in the summer of 2010, and blaming time constraints is just the easy way out of a question that deserves an honest answer. A recent piece by @JFFutbol poses the question sharply: “is football blogging dead, dying, or simply changing?

Author Johnathan Fadugba comes up with three major reasons for the decline in blogging: time constraints, ‘it isn’t going anywhere’, and ‘it isn’t fun anymore’. None of these apply to 11tegen11, since time constraints are no different now or back then, over the years we’re absolutely going somewhere, and I definitely enjoy writing blog pieces. Yet I do have the feeling that my blogging activity is painfully slow recently. So, here’s a personal story about the 11tegen11 blog, and how it has developed over the years.

 

Tactics

In the summer of 2010, 11tegen11 started out as a tactics blog with a focus on Dutch football. My aim at 11tegen11 had always been to be an independent, personal blog that provides well-constructed opinions on anything Dutch football related. The use of numbers and analytics was a logical path to take. I figured I’d use numbers and analysis to form an opinion, write about it and be different from just a random guy with an opinion. My writing focuses more on the travel (analytics), than the destination (conclusion).

The biggest problem, back in 2010, was the general lack of access to data. My writing mainly concerned tactical match and team reports. That was hardly data driven at all, but it did help me to get in touch with two data companies: Infostrada Sports and InStat Football. Both of them helped me get access to data I would never have seen otherwise, though, back in 2010, that meant raw shot and possession numbers per match. Which still felt like the bomb, by the way. My football blogging helped me to establish a platform to use this data, which I would not have had if I‘d just been the average casual fan.

 

Oh happy days

Exploring this level of data with our growing football analytics community, we dragged the concept of Total Shots Ratio (TSR) as far as we could. We’ve developed predictive models based on TSR, used it to evaluate manager performances, and successfully identified under- and overachievers at several stages of the season.

Databases were simple two dimensional spreadsheets, calculations were done within seconds, and the rest of the evening remained for writing. For most of 2011 we had a lot of fun with simple concepts like TSR, which proved a decent performance analysis tool.

 

Data

In 2012, things started changing. Websites like Squawka and WhoScored filled our desire for more and better data. Both sites bring a wealth of OPTA-fueled data at just some mouse clicks away. Shot charts, minute-by-minute data, individual player actions, you name it.

It wasn’t long before even we, TSR protagonists, had to confess the limits of simply counting each and every goal scoring attempt. It took some time to develop, but the invention of ‘Expected Goals’ (ExpG), was inevitable (as can be seen in this philosophical piece from 2011). With ever refining models, we assign each goal scoring attempt a number between 0 and 1 to reflect to odds of said chance resulting in a goal. ExpG is definitely the eye catcher of football analytics at present, but the possibilities are endless, both on team and player level.

 

Mainstream

Meanwhile, the activity of our football analytics community did not go unnoticed in mainstream media and from 2012 onwards, a significant number of early blog writers got snapped up by established media sources or data companies.

Personally, early in 2013 I was offered the opportunity to join a small group of pioneers and start writing for the website of Dutch national newspaper De Volkskrant. Recently, I could add a support writing role for digital news medium ‘De Correspondent’, which meant a step up in mainstream media land. The increased attention allowed us to show our work to a bigger crowd of Dutch readers at an established stage, but it also brought along the pressure of deadlines and expectation. All that time, blog writing could wait.

 

Complexity

With the introduction of Squawka and WhoScored in 2012, the amount of publicly available data grew exponentially, and so did the complexity of our analytics. Personally, I used some in-between-jobs time to train myself to use R statistical software to make best use of our new found wealth, and time investment sharted shifting from writing to analyzing.

The present ExpG model on 11tegen11 is a self-learning general linear regression stratified for different match situations like open play, corners, free kick, etcetera. The model uses as much contextual information as possible within the limits of on-ball data. Shot location, shot type, assist information, game state, league effects are all used if appropriate for the match situation at hand. A spare hour is easily spent trying to fine tune some aspects of the model, or to fix some complicated large size database issues. Again, blog writing could wait.

On top of that, in the back of our minds, a soft voice kept insisting: “don’t share everything you’re developing now, it might be of competitive advantage”. So far, it’s hard to earn money with football analytics, though that may change in the future. Clubs refrain from massively adopting analysts for various reasons, and the betting industry is pretty hard to catch over longer periods of time. Personally though, this phenomenon has played a role for a while, and it would be unfair to open up in this piece without mentioning this factor.

 

More distractions

Pressed in between work, social life, and new-found deadlines for mainstream work, it was often easier for me to pop out a twitter shout or a short infographic. R is a great piece of software to create scripted infographics, and potential blog pieces ended up half-written before actuality had caught up with them, or never even got further than some pilot data work.

On top of that, blog writing suffered severe competition for the one thing even better than football data. Right, watching football that is. Now that’s where 2010 and 2014 make a huge difference. Nearly every day between August and May holds top level league matches that can be found on TV or streaming on the internet. And for those dull months in between there’s play-offs, World Cups, friendlies, etcetera. Never an evening without football on your flatscreen. And, with the advent of detailed league data worldwide, the number of leagues to get indulged just keeps on growing. If you can watch the Argentine Super Clasico, blog writing can wait.

 

Quality

Back in the TSR days of 2011, writing about football analytics was easy. In counting shots there isn’t much one can do wrong. But things are different now. Complex scripts contain small errors that need tracking and fixing. The free flowing game of football needs complicated analysis to be at least somewhat accurate, and complicated analysis needs a lot of words to be explained.

People want to read about football, not about analytical modelling, and it’s a challenge to walk the tight rope between under and over explaining analytical methods. On 11tegen11 at times, I’ve avoided this issue by not writing at all, or, in most cases, by focusing on concepts (like scouting or identifying playing style) rather than teams or players. The concepts often didn’t return. Not because they weren’t interesting, but because self-imposed 1000 or 1500 word limits for team of player articles doesn’t leave room for explaining the concepts enough.

Perhaps that’s wrong, and I should have just used terms like ‘crosses to through ball ratio’ or ‘ExpG over performance’ regularly so that returning readers would familiarize themselves with it. And readers that shy away from terms like that, well, would that be your audience anyway?

 

In the end

In the four years that 11tegen11 has been around, a lot has changed. We’ve got more detailed data than we can handle, we can see more matches than would actually be healthy, and kept writing waiting for too long.

Football analytics blogging may well be at a breaking point in its short life. Investing more in deeper and more complicated – yet more accurate – analysis, without explaining to a wider audience, would see us dig a hole for ourselves. It would make our little community inaccessible in a few years time, and that would not help develop this niche that I don’t think should be a niche.

Writing can makes watching and analyzing football more fun. If we’d make up for lost ground and write without those unpretentious pieces that we did a few years ago, we’d be better off in the long run. Not all pieces need to be mouth-watering analysis in eloquently written near poetry. Bring back the raw unedited pieces that football blogging should be all about. Bring back the fun!