Category Archives: Uncategorized

An early look at performances in La Liga

For the fourth and final part of our miniseries, attention shifts to Spain. Will anyone be able to put up resistance to the picture perfect season opening by Barcelona? Is Valencia really this season’s surprise package? And will Sociedad recover from their disastrous opening?

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix La Liga 2014-15 16 oktober 2014Yes, it’s Barcelona, a bit of nothingness, more nothing, and then the rest. An out of this world ExpG-ratio of 0.799 combined with an extreme PDO wave of over 1175 resulted in six wins and a draw, no goals conceded, and La Liga’s title all but clinched. The PDO will resolve, points will be dropped, but hey, no one really looks like catching Barcelona here.

A close bunch of five teams competes for the honours of second place, as it seems. The expected names of Real, Sevilla and Atletico are there, but Celta seem to do very well in just their second season after promotion, as well as those other boys from Barcelona, Espanyol.

In analytics terms, Valencia are an outlier of note. Their PDO has been even higher than Barcelona’s, riding them to 2nd place in the table. Point is, and ExpG-ratio of 0.493 is not going to take them far, once this PDO wave runs out of steam. Obviously, the 17 points from seven matches will boost their final standings, but on wouldn’t really expected them to threaten the top three.

Orange is trouble in the Good-Lucky matrix, so Granada, Córdoba and Levante catch the negative light here. Bilbao will improve, once their PDO pulls towards the red line, but are still way below mid-table.

 

Points per Game

If you’re performance is as elite as Barcelona, you won’t drop many points. Their 0.799 ExpG-ratio simply means they are on average four times more likely to scores than their opponents. Hard to see them losing more than a handful over the season then.

As expected, Valencia are flying high, but don’t have the performance levels to back it up, as does Granada, though at another level. Espanyol will pull up over the coming weeks, as will Sociedad, and Deportivo to an extent.

ExpG-r vs PPG by team La Liga 2014-15 16 oktober 2014

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table La Liga 2014-15 16 oktober 2014

An early look at performances in the Serie A

In the third part of our miniseries, the focus shifts to Italy for the Serie A. After failing to drop a single point at home last season, Juventus currently haven’t dropped a single point in any of their six matches. Like in the Bundesliga, the title run might not be that contested, but behind the boys from Turin, all sorts of excitement and unpredictability arises.

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix Serie A 2014-15 14 oktober 2014Like Bayern in Germany and Chelsea in England, Juventus dominate the Good-Lucky graph, but mostly so on the ExpG axis, which is a good thing for them. With an ExpG-ratio of 0.767 they set an unprecedented dominance.

Behind them is an interesting group of eight blue-ish teams, who seems clearly separated from the rest of the bunch. I write interesting, because apart from Roma and Sampdoria, this bunch had been more or less unfavourable in PDO. The best bet for PDO issues to resolves, and that means we should expect the like of Napoli (7th in the table) and Lazio (8th) to put up a good chase of 2nd placed Roma and 3rd placed Sampdoria, whose outcome seems partly PDO fueled.

Although Milan (5th in the table) would be deemed in less deep trouble than their rivals Inter (10th), it seems to be a matter of time before the ‘nerazzuri’ will catch up with the ‘rossonero’.

Down the ExpG axes it’s Chievo who find themselves in most trouble, with early season surprise package Udinese (4th in the table) the most likely candidates for a winter depression. Sassuolo, Parma and Palermo illustrate the fact that PDO rules early in the season, as these bottom three in PDO terms are also bottom three in the table with 3 points from 6 matches.

 

Points per Game

In the Serie A, just like in the Bundesliga, the connection between performance and points is quite direct. Holding a perfect 3 points per game spot high on top are Juventus, while this graph illustrate future drops and rises.

It’s quite easy to spot what’s going to happen at Udinese and Verona, who hold more points that their performance so far justifies. The reverse is true for Cagliari, and in decreasing order Parma, Palermo and Sassuolo.

ExpG-r vs PPG by team Serie A 2014-15 14 oktober 2014

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table Serie A 2014-15 14 oktober 2014

An early look at performances in the Bundesliga

In the second part of our miniseries, the focus shifts from the English Premier League to the Bundesliga. Bayern Munich have hardly been contested for the past two seasons, and already hold a ten point lead over Klopp’s Dortmund, but how do things look beneath the surface of the league table? Will Hamburger SV and Werder recoup their disastrous season starts? And Paderborn hipsters, anyone?

 

Good-Lucky

Using the recently explained Good-Lucky matrix, in a format adopted from Benjamin Pugsley, we can easily scan the league for the best performance teams (horizontal axis) and the most efficient teams (vertical axis). Anyone into football analysis will know that being highly efficient lasts only so long, and PDO levels tend to revert back to normal before you know it. Depending on team quality, normal is a PDO of 980-ish for poor teams and 1020-ish for good teams.

Good - Lucky Matrix Bundesliga 2014-15 14 oktober 2014Dominating this matrix in both performance and efficiency, no surprise, is indeed the boys from München. Although they may regress a little bit in PDO, their solid 0.703 ExpG-ratio is undisputable league winning form.

Best of the rest are Leverkusen and Dortmund, although neither has been able to show that result-wise, due to extreme PDO depressions. This will revert, but in order to keep the trace of leaders Bayern, they will need and extreme PDO dip for their rivals to occur with unlikely PDO waves themselves. Not gonna happen.

Good performances are noted by Wolfsburg, Gladbach and Freiburg. Yes, 15th ranked Freiburg that is. They won’t likely make a run for the CL spots, but with any good PDO wave through their season, they may just knick one of the EL spots, if their current performance holds up. More on that later.

Disappointments are mid-table Schalke, who don’t look like moving up soon and Hoffenheim, whose early season run seems fuelled by an efficiency that can’t hold. Hamburg and Werder may be the bottom teams in the table, but are not in serious relegation form in ExpG terms.

That orange zone of relegation form, holds two teams that have been bailed out by an early season PDO wave – Köln and most notably Paderborn – and 16th ranked Stuttgart.

 

Points per Game

The Bundesliga already displays quite a direct connection between performance and outcome, indicated by the steep regression line. As to be expected given their underlying performance, Bayern has already distanced itself from the pack, with a nice trailing group of that will compete for 2nd to 7th place, as it seems. The models prefers Leverkusen for now, but with a very small margin, and it’s still early days.

ExpG-r vs PPG by team Bundesliga 2014-15 14 oktober 2014Werder, Hamburg and Freiburg all hold quite low positions, and it’s easy to see them catching up as more matches will be played, and teams will tend to move towards the red line. The interesting case study here is Freiburg, whose ExpG-ratio is displayed as an amazing 0.554! Yet their prediction below is a sober bottom spot with just around 31 points. How come?

In the predictions, teams are evaluated according to their non-blocked non-penalty non-rebound shots. Freiburg has one of the highest percentages of blocked shots (30%) and one of the lowest percentage of shots blocked by their own defense (18%). That is not a good thing, and it seriously hurts their prediction. Furthermore, they have already been awarded 3 penalties, which means they’d need some 15 penalties to keep this pace until the end of the season. Not gonna happen, and the model knows that.

 

Predictions

Here’s the ‘sticking my neck out’ part of this mini-series. Using ExpG as a basis, a pretty straightforward model can simulate the remaining part of the season and come to predictions for the final league table. I figured it would be more fun sharing these from time to time, for various leagues, and see what we can learn along the way towards the end of the season.

For this model I’ve limited ExpG to 11v11 or 10v10 situations, filtered out blocked shots (since shot blocking is a skill), filtered out penalties (since they are distributed pretty random and skew the numbers a fair bit) and filtered out rebounds. Furthermore, I’ve regressed the ExpG towards last season’s numbers, based on the R2 between ExpG’s on each particular match day to ExpG’s at the end of the season.

Without further ado, here’s the graph of predicted points, along with a box plot showing the spread and most likely number of points for each particular team. Enjoy!

Boxplot projected league table Bundesliga 2014-15 14 oktober 2014

The Good/Lucky Matrix

It’s early October, and the league tables around Europe are starting to shape up. If you want to see how your team’s doing, it is tempting to check the league table, but you may well fool yourself into an opinion by doing so. With just over a handful of games played, league tables tend to lie. So, how can we do better without overcomplicating things?

Here’s where the Good/Lucky Matrix comes in. The brilliant Benjamin Pugsley released this very fitting name for a plot with a straightforward design. The Good/Lucky Matrix depicts exactly the type of information we are looking for, without additional fancy complicated stuff. In fact, it is a sublime graphical representation of the concepts that have shaped football analytics here and elsewhere over the past years, shot ratio and PDO, separating skill and luck.

Good - Lucky Matrix Eredivisie 2014-15 05 oktober 2014

The Good/Lucky Matrix consists of two simple, yet crucial elements: ExpG ratio and PDO.

 

Good

The horizontal bar presents how good teams have performed to date. Ben prefers the ‘Shots on Target Ratio’(SoTR), but to best evaluate team performance, I prefer the Expected Goals Ratio. The method behind my ExpG formula is explained here. In return for the added complexity that ExpG has over a simple shots count like SoTR, it adds more detailed shot information and a better appreciation of shot quality. It’s a matter of taste, but if you have an ExpG at hand, then why not use it?

 

Lucky

The vertical bar represent an acronym called ‘PDO’. Most readers will probably be familiar with PDO, but for those who are not, it’s a simple addition of a team’s save percentage and scoring percentage. The league average PDO will always be 1000, since one team’s goal is another team’s goal conceded.

As a rule of thumb, the best teams in a league will have a PDO around 1020, while the worst teams don’t drop below 980 in the long-term. In other words, PDO’s outside that zone indicate under- or over performance that won’t hold up long-term. For practical reasons, we shall call this luck, and for now skip the philosophical debate whether over performance is indeed luck or not.

The red line in the Good-Lucky Matrix indicates a roughly normal PDO for a given performance. In the present Matrix it is in fact the regression line between ExpG ratio and PDO.

 

Same PDO, different luck

Please take a look at the Matrix and locate, from left to right, Go Ahead Eagles (ExpG-R 0.285 ; PDO 987), AZ (ExpG-R 0.527 ; PDO 988) and Feyenoord (ExpG-R 0.752 ; PDO 1001). Here are three teams with vastly different performances: very poor, upper mid-table and elite. A simple look at the PDO would say they are all well within the 980-1020 zone where we would assume they have neither been lucky, nor unlucky.

But, based on the correlation between performance and PDO, I would say that Go Ahead Eagles have been a bit lucky, AZ a slight bit unlucky, and Feyenoord quite unlucky so far.

 

The extremes

On the extreme sides of the PDO axis are Heracles (unlucky) and PSV (lucky).

Heracles, who have just won their first game this weekend after an 0 for 7 start, were never as bad as their start to this season indicated. Their results seem mainly driven by an extremely low PDO (872) that will soon find its way to a more sustainable zone. Heracles’ ExpG ratio of 0.445 on the low end of the mid table bunch of the league, and if their performance stays like it is, it is to be expected that their league table position will reflect that in time.

PSV, who lead the league table with 18 points from 8 matches, should be happy and worried at the same time. Happy that they won over two points per match while two teams with better underlying performance (Feyenoord and Vitesse) trail them by 7 points already. Worried, that their underlying performance does not indicate title winning form, which generally requires an ExpG ratio over 0.650.

 

In the end

The Good/Lucky Matrix, with all credit to Benjamin Pugsley, will make frequent appearances here, if I don’t find the time for extensive pieces, but feel the need for a quick analytical glance. For me, it’s a perfect tool to grasp the actual state of teams.

Turn on the scouting radar! 

We live in fortunate times. As football fans we’ve got all sorts of information about our stars at just a mouse click away. Any moment in any day can be filled with watching football, reading about football, or checking football stats.

 

Panini

How different were things we things when I fell in love with football. In the summer of ’86, when I was nearly eight years old, I studied players’ clubs, birth dates and positions, simply because that was all my Panini album had to offer. I tricked my parents into letting me watch some first halves, despite kick-off times far beyond my usual bedtime.

$T2eC16hHJGYE9nookPZnBQhRQbKBPw~~60_57

What I lacked in information, I compensated in fantasy. I created my own truth about Andoni Zubizarreta, the Spanish goalkeeper with that magnificent surname and that fascinating look in his eyes. And about Diego Armando Maradona, of whom I knew little else than Lanus, 30-10-1960, Napoli.

 

Overload

Some thirty years later, things are so very different. My constant hunger that made even the most basic stats taste good, has been traded for a stats overload that makes it hard to get a sense of what’s really going on.

In this age of information overload, value lies in cropping data to bite-size proportions, without losing its relevance. In assessing a football player, one might be able to scan through some individual stats like non penalty goals, key passes, dribbles and dispossessions. But after three players, I’m kind of full.

Comparing a league full of players just by looking at stats is an impossible task for the human mind. However, comparing different shapes is a task we are – by evolution – much better at. So, back in January, when I saw Ted Knutson’s magnificent work on player radars, the fun in individual player stats was back, immediately!

Looking at shapes, one can easily get a feel for strikers that offer great link-up play, or midfielders that offer little else besides defensive protection. The radars offer a fantastic link between player traits and cold stats.

 

Upgrade

Just like the ExpG model, the player radars on 11tegen11 have had a huge summer upgrade. I’ve decided to apply nearly the same format that Ted uses on StatsBomb. The reason behind this is quite simple: I think radars have a huge potential in opening up the stats world to a big audience. When each analyst uses their own radar versions, wider adaptation will be slowed down a lot. We shouldn’t niggle about subtle differences when it’s clearly better to just step over those details and show the world what we can do.

Like Ted described in one of his introductory pieces on StatsBomb, there are different templates for different positions.

–       AM/FWD for strikers, wide attackers in front threes, and the three men band in a 4-2-3-1.

–       CM/DM for central and defensive midfielders

–       Fullback for eehhh… fullbacks.

Goalkeepers and central defenders don’t have templates yet, since we don’t exactly know what stats to judge them by.

The outside boundaries of the chart represent 95% percentiles to prevent players like Messi and Ronaldo from dwarving the rest. The inside boundaries, likewise, represent the 5% percentiles. Negative axes, like fouls, are inverted so that bigger coloured areas are always indicative of better performances. As a reference database, to compile the axis limits, I’ve used the 2012/13 and 2013/14 seasons of the top-5 leagues (EPL, Bundesliga, La Liga, Serie A and Ligue 1).

 

Radar

For a nice example of the CM/DM template, meet midfielder Kamohelo Mokotjo, recently transferred to Twente, but leading PEC Zwolle to last season’s Cup victory.

Radar chart - @mixedknuts version - Kamohelo Mokotjo - Eredivisie 2013-14

Mokotjo tore the Eredivisie apart last season. Without reaching the 95% mark in any category, he scored high on nearly every axis, while playing nearly almost three quarters of all possible minutes. If we would scout for spectacular stats in certain categories, chances are we’d miss Mokotjo. There’s not one thing he does so good that he reaches the outer boundary that is the 95% percentile. It’s the combination of doing everything very good that makes him a fantastic player.

 

Scouting

It’s pretty obvious from this radar to see that Mokotjo had a magnificent season, but wouldn’t it be great if we could somehow quantify player radars?

Well, the good news is, we can.

I’ve computed the surface of Mokotjo’s radar and compared it to the same database that I’ve used to find the size of the axes. The surface of Mokotjo’s radar is compared to the reference database, and value is scaled on 0 to 5 stars.

To find a central midfielder with a radar surface this large is very rare, so Mokotjo is awarded the full 5 out of 5 gold stars. It’s like Football Manager’s player ranking system brought to real life. Small caveat: it’s easier to score high stats in the Eredivisie than in the EPL. One day, when we’ve learned how stats translate between leagues, we may know how to adjust for league differences.

 

Potential

Knowing how good a player’s season was is one thing, it’s another thing to know something about potential. This time, look at 17-year old new Ajax signing Richairo Zivkovic.

Radar chart - @mixedknuts version - Richairo Zivkovic - Eredivisie 2013-14

Playing for Groningen, Zivkovic had a hugely impressive debut season, for a 17 year old. His performance in front of goal was elite, his performance in terms of passing, dribbling and defensive contribution was, well, nearly absent.

In terms of radar surface, Zivkovic earned 3.5 stars, and should he add more passing to his game, or more dribbling, or some defensive work, he should rise in terms of stars. Still, for a 17-year old, this was an elite season. Now, how to put this ‘ for-a-17-year-old’ thing in our stars ranking?

Well, quite simple actually, by comparing a player to his peers, rather than to the full reference database.

The silver stars compare a player not to all other players, but only to players of the same age, or younger. There are just a handful of 17 year olds in the database, so I’ve set the lower limit to 18. When comparing Zivkovic to other forwards aged 18 years or younger, he turned in an elite performance, earning him 5 out of 5 silver stars. So, there we’ve got Football Manager’s second star rating: potential!

 

In the end

I’ve only just finished scripting these new player radars, but I still find myself playing around with them. Finally, we’ve got a tool that makes individual player statistics fun.

We’ll use them a lot this season, both on this site and on Twitter. The radars now allow for true player scouting, both in terms of actual quality, and in terms of future potential.

Expected Goals 2.0 – Some light in the black box

If football analytics was a Hollywood movie, Expected Goals would definitely be the poster boy. The influx of attention for football analytics during the recent World Cup meant a lot of attention for the concept of Expected Goals, or ExpG as its mostly referred to. With that attention came two very important questions, that I’ll try to address in this post. What is ExpG? And how do you compute it?

 

What is ExpG?

Expected Goals is assigning each goal scoring attempt a number between 0 and 1, to represent the chance that this goal scoring attempt results in an actual goal.

I use a model that I have revised completely over the summer, so this makes for a perfect time to explain the full workings of it. Expected Goals 2.0, here we go…

 

Modelling

Suppose I tell you that a football match has just finished and I ask you to estimate the number of goals for each team. You know nothing. Not the teams, not the occasion, not the shot numbers, and nothing that happened on the pitch.

You’d probably say both teams have scored around 1.4 goals, since 2.8 is a good estimate for the average number of goals per football match. Since you have absolutely no information about the match at hand, estimating this average of 1.4 goals per team should lead to the smallest difference between your estimate and the actual goals by each team.

In building a model, the difference between your estimate and the actual outcome is called the error, and you should be aiming to keep the error as small as possible.

(don’t look down at the .gif yet)

 

Shots

Now, I tell you that the match at hand had 10 shots by team A and 14 shots by team B. Would this change your estimate of 1.4 goals for each team?

Since we know that on average 1 in 9, or 11% of shots results in a goal, it would make most sense to estimate 1.1 (10 * 0.11) goals for team A and 1.54 (14 * 0.11) goals for team B.

This is your most basic expected goals model at work. In fact, it is what we’ve been doing for years, with Total Shots Rate. The total number of shots is a nice, but far from perfect, indication of the number of goals you can expect.

 

Attempts

Let’ s add some more information to our model, and for the sake of readability of this piece, I’ll give you all visual information on a single goal scoring attempt that we’ll use as an example of the current ExpG model that I use on 11tegen11.

sneijder

Here’s what the ExpG model sees.

  1. The match situation is open play

The models discriminates between seven match situations: open play, corners, direct free kicks, indirect free kicks, penalties, rebounds and first time attempts.

  1. A non- league match

This fragment, in case you hadn’t noticed originates from the Spain vs. Netherlands match at the past World Cup. For each league, different conversion rates are computed for each match situation.

  1. Game State

The score line during this attempt is 0-0, so the odds of scoring are slightly reduced. Shots at even game state are converted a bit less than shots at GS +1, or even GS -1.

  1. Shot location

The angle to the goal is 22 degrees and the distance is almost 15. Note the absence of units for distance, I don’t compute yards or meters, just an abstract number based on coordinates. In terms of modelling, it’s all about the relative difference between different goal scoring attempts, and not about getting the distance correct in absolute terms.

To compute the angle to the goal, I compute angles to both goal posts and take the difference between those two numbers. The number you get represents the view a player has on the goal. It represents how much of a 360 degree circle around the player is represented by the goal. For more lateral positions and more distance from the goal, the number goes down. I prefer this method over a simple angle to the middle of the goal, since works better for close ranges, where most shots are taken.

  1. Shot type

This is a shot, rather than a header. Given the location, this makes a huge difference in terms of ExpG.

  1. Though ball

The shot has been assisted with a through ball. This is a big plus for ExpG, since through balls generally reduce the number of defenders able to contest or block the shot.

  1. Cross

The shot has not been assisted with a cross. Crosses are bad. They have a negative influence on ExpG. It’s easy to get loads of crosses in, so in terms of trying to score goals they may be good for some teams at some times, but it’s harder to score when the goal scoring attempt comes off a cross than when the same goal scoring attempt does not come off a cross.

  1. Touches

The attacking team has taken three touches. More touches taken reduces ExpG, since (generally) defenders have more time to get in position to defend.

  1. Vertical speed

In the build-up of play, the attacking team has moved the ball forward at 2.87 per sec. Note the absence of units for distance, since this is again an abstract number based on coordinates. More important point: quicker vertical movement leads to higher ExpG.

 

Regression

None of the above items are used because I personally think they are important for ExpG measurement. They all show up as significant factors in a multivariate regression analysis that I’ve run on some 160.000 goal scoring attempts in various match situations and various leagues. Just like we tried to minimize the error in our initial two estimations in the early stages of this article, a complex regression models tries to minimize those errors for large numbers of shots and large numbers of potentially important factors for ExpG.

In the end, for open play shots, the above mentioned factors prove to be important. For different match situations, different factors are important. You can imagine that vertical speed is not important to score from corners, or that for indirect free kicks the number of touches is not important (the defense is set to defend anyway). The joy of a multivariate regression model is that it’s not up to you to decide which factors to use (and then having to defend your choice on blogs and twitter), it’s the model that advises you which factors to use and how to weigh them.

In the future, we may discover new items to measure. If the multivariate regression model then suggests them to be of significant influence on ExpG in certain match situations, they will be added for those match situations. The model is a living thing. If I can improve it, I will.

 

Defensive pressure

The most frequently heard comment on any ExpG model is probably the fact that defensive pressure is not incorporated. That’s both true, and not true, depending on how you define defensive pressure.

Since all data is based on ‘on-ball events’, we don’t have any direct information on the position of defenders and goalkeepers. In isolated cases, this can be quite frustrating. Sometimes a goalkeeper is stranded way out of position, and your model ends up underestimating the ExpG of that goal scoring attempt.

The model may not have direct information on defender and goalkeeper positioning, it does have a lot of indirect information on it. Game State, vertical speed, crosses, through ball and number of touches all carry some information about the amount of defensive pressure that is present for a goal scoring attempt. Obviously, direct information would be preferable, but even with this indirect information, for 99% of attempts we get a good sense of defensive pressure.

 

In the end

With this piece, I’ve opened up about as much as I can on the workings of my ExpG model. There is no single formula that I can give. It’s not as simple as ‘shots from this zone get 0.12, headers from that zone get 0.07’.

Each goal scoring attempt is judged on the basis of its relevant contextual information. The result is the best estimate I can create for each goal scoring attempt. Using the best contextual information can teach you so much about football, let’s have a lot of fun with it this coming season!

Why do we write?

We are busy people. Most of us are in their twenties or thirties, have demanding day jobs, and a partner or family we love to attend to. And, just for fun, a few years back we opened a blog and wrote something about football and numbers. We liked it, so we wrote some more, and kept on doing so. We coined ourselves something of an online community of football analytics bloggers, and by now we’ve been around for years.

But something is changing. Most established football analytics blogs experience a severe drop in articles over the past year or so, and 11tegen11 is no exception to that trend. We are busy with our jobs, lives and families. Football writing can wait a moment, and another moment, and another moment. To the point where I started believing my own lie that I couldn’t find the time for writing recently.

 

Busy

Life is no busier now than it was when I started writing, back in the summer of 2010, and blaming time constraints is just the easy way out of a question that deserves an honest answer. A recent piece by @JFFutbol poses the question sharply: “is football blogging dead, dying, or simply changing?

Author Johnathan Fadugba comes up with three major reasons for the decline in blogging: time constraints, ‘it isn’t going anywhere’, and ‘it isn’t fun anymore’. None of these apply to 11tegen11, since time constraints are no different now or back then, over the years we’re absolutely going somewhere, and I definitely enjoy writing blog pieces. Yet I do have the feeling that my blogging activity is painfully slow recently. So, here’s a personal story about the 11tegen11 blog, and how it has developed over the years.

 

Tactics

In the summer of 2010, 11tegen11 started out as a tactics blog with a focus on Dutch football. My aim at 11tegen11 had always been to be an independent, personal blog that provides well-constructed opinions on anything Dutch football related. The use of numbers and analytics was a logical path to take. I figured I’d use numbers and analysis to form an opinion, write about it and be different from just a random guy with an opinion. My writing focuses more on the travel (analytics), than the destination (conclusion).

The biggest problem, back in 2010, was the general lack of access to data. My writing mainly concerned tactical match and team reports. That was hardly data driven at all, but it did help me to get in touch with two data companies: Infostrada Sports and InStat Football. Both of them helped me get access to data I would never have seen otherwise, though, back in 2010, that meant raw shot and possession numbers per match. Which still felt like the bomb, by the way. My football blogging helped me to establish a platform to use this data, which I would not have had if I‘d just been the average casual fan.

 

Oh happy days

Exploring this level of data with our growing football analytics community, we dragged the concept of Total Shots Ratio (TSR) as far as we could. We’ve developed predictive models based on TSR, used it to evaluate manager performances, and successfully identified under- and overachievers at several stages of the season.

Databases were simple two dimensional spreadsheets, calculations were done within seconds, and the rest of the evening remained for writing. For most of 2011 we had a lot of fun with simple concepts like TSR, which proved a decent performance analysis tool.

 

Data

In 2012, things started changing. Websites like Squawka and WhoScored filled our desire for more and better data. Both sites bring a wealth of OPTA-fueled data at just some mouse clicks away. Shot charts, minute-by-minute data, individual player actions, you name it.

It wasn’t long before even we, TSR protagonists, had to confess the limits of simply counting each and every goal scoring attempt. It took some time to develop, but the invention of ‘Expected Goals’ (ExpG), was inevitable (as can be seen in this philosophical piece from 2011). With ever refining models, we assign each goal scoring attempt a number between 0 and 1 to reflect to odds of said chance resulting in a goal. ExpG is definitely the eye catcher of football analytics at present, but the possibilities are endless, both on team and player level.

 

Mainstream

Meanwhile, the activity of our football analytics community did not go unnoticed in mainstream media and from 2012 onwards, a significant number of early blog writers got snapped up by established media sources or data companies.

Personally, early in 2013 I was offered the opportunity to join a small group of pioneers and start writing for the website of Dutch national newspaper De Volkskrant. Recently, I could add a support writing role for digital news medium ‘De Correspondent’, which meant a step up in mainstream media land. The increased attention allowed us to show our work to a bigger crowd of Dutch readers at an established stage, but it also brought along the pressure of deadlines and expectation. All that time, blog writing could wait.

 

Complexity

With the introduction of Squawka and WhoScored in 2012, the amount of publicly available data grew exponentially, and so did the complexity of our analytics. Personally, I used some in-between-jobs time to train myself to use R statistical software to make best use of our new found wealth, and time investment sharted shifting from writing to analyzing.

The present ExpG model on 11tegen11 is a self-learning general linear regression stratified for different match situations like open play, corners, free kick, etcetera. The model uses as much contextual information as possible within the limits of on-ball data. Shot location, shot type, assist information, game state, league effects are all used if appropriate for the match situation at hand. A spare hour is easily spent trying to fine tune some aspects of the model, or to fix some complicated large size database issues. Again, blog writing could wait.

On top of that, in the back of our minds, a soft voice kept insisting: “don’t share everything you’re developing now, it might be of competitive advantage”. So far, it’s hard to earn money with football analytics, though that may change in the future. Clubs refrain from massively adopting analysts for various reasons, and the betting industry is pretty hard to catch over longer periods of time. Personally though, this phenomenon has played a role for a while, and it would be unfair to open up in this piece without mentioning this factor.

 

More distractions

Pressed in between work, social life, and new-found deadlines for mainstream work, it was often easier for me to pop out a twitter shout or a short infographic. R is a great piece of software to create scripted infographics, and potential blog pieces ended up half-written before actuality had caught up with them, or never even got further than some pilot data work.

On top of that, blog writing suffered severe competition for the one thing even better than football data. Right, watching football that is. Now that’s where 2010 and 2014 make a huge difference. Nearly every day between August and May holds top level league matches that can be found on TV or streaming on the internet. And for those dull months in between there’s play-offs, World Cups, friendlies, etcetera. Never an evening without football on your flatscreen. And, with the advent of detailed league data worldwide, the number of leagues to get indulged just keeps on growing. If you can watch the Argentine Super Clasico, blog writing can wait.

 

Quality

Back in the TSR days of 2011, writing about football analytics was easy. In counting shots there isn’t much one can do wrong. But things are different now. Complex scripts contain small errors that need tracking and fixing. The free flowing game of football needs complicated analysis to be at least somewhat accurate, and complicated analysis needs a lot of words to be explained.

People want to read about football, not about analytical modelling, and it’s a challenge to walk the tight rope between under and over explaining analytical methods. On 11tegen11 at times, I’ve avoided this issue by not writing at all, or, in most cases, by focusing on concepts (like scouting or identifying playing style) rather than teams or players. The concepts often didn’t return. Not because they weren’t interesting, but because self-imposed 1000 or 1500 word limits for team of player articles doesn’t leave room for explaining the concepts enough.

Perhaps that’s wrong, and I should have just used terms like ‘crosses to through ball ratio’ or ‘ExpG over performance’ regularly so that returning readers would familiarize themselves with it. And readers that shy away from terms like that, well, would that be your audience anyway?

 

In the end

In the four years that 11tegen11 has been around, a lot has changed. We’ve got more detailed data than we can handle, we can see more matches than would actually be healthy, and kept writing waiting for too long.

Football analytics blogging may well be at a breaking point in its short life. Investing more in deeper and more complicated – yet more accurate – analysis, without explaining to a wider audience, would see us dig a hole for ourselves. It would make our little community inaccessible in a few years time, and that would not help develop this niche that I don’t think should be a niche.

Writing can makes watching and analyzing football more fun. If we’d make up for lost ground and write without those unpretentious pieces that we did a few years ago, we’d be better off in the long run. Not all pieces need to be mouth-watering analysis in eloquently written near poetry. Bring back the raw unedited pieces that football blogging should be all about. Bring back the fun!

Shooting style at a glance

The 2014 World Cup has been an amazing experience. It will enter history as the World Cup where Brazil collapsed in front of their home crowd, where the world fell in love with a fresh and talented Colombian side, and where three-at-the-back defenses proved that they’re back from the dead. But it was also the World Cup where the world at large tasted the use of stats in football, and seemed to like it.

 

Stat love

Over the past years, the small community of stat loving football bloggers have been cooking some nice concepts that proved tasteful to some, and at least digestable to most fans. The concept of Expected Goals is the best example, and it is now more accepted than ever. Intuitively, separating poor from good quality chances makes a lot of sense, and ExpG allows us to communicate much better than simple shot counts.

This post will aim to do just that: communicate different aspects of shooting behavior. In one plot, I hope to separate quality shooters from quantity shooters, involved shooters from uninvolved shooters, and efficient from inefficient shooters. That’s quite a lot, and it runs the risk that every data visualization carries: showing too much in one picture. Still, on this one, I’m convinced, dear reader, that you can do it.

 

The plot

So, here’s the plot I was talking about… And before going into further details, I should point you to Stephen McCarthy’s inspirational work on data visuals, which has obviously formed the inspiration for this design.

Shooting style players Eredivisie 2013-14

Nice colors, right? For a full size version, click on it.

This plot combines four elements that constitute a player’s finishing. The horizontal axis is simply the number of shots per 90 minutes played, and the vertical axis is the total amount of Expected Goals per 90 minutes. Both dotted lines represent the two standard deviations mark.

Of course, for all information in this chart, penalties are excluded. Oh, and only players playing over 30% over minutes available are included to prevent the per 90’s from being screwed.

 

Rainbow

The nice rainbow of colors represents the average ExpG per shot, ranging from very poor (red), through average (green), to excellent (purple / pink). Since ExpG per shot is the same as dividing the vertical axis by the horizontal axis, the colors are nicely arranged in the chart. Poor shot quality will prevent a player from building up ExpG, so red and orange dots will fly at the bottom of the balloons, while high shot quality helps build up ExpG quickly, and leads to the pink/purple/blue dots flying on top.

The fourth parameter is the size of the dots, where bigger dots represent more goals scored. Players with bigger dots than those around them, like Alfred Finnbogason, have converted at a more efficient (and probably unsustainable rate) than others. Reversely, players with relatively small dots, like Mulenga, Havenaar and Depay, have converted inefficiently, which, by the same line of thought, is expected not to carry over to future performances.

 

Player styles

Memphis Depay has been the absolute shot monster of the 2013/14 Eredivisie, but with his limited shot quality, he remains quite a distance behind the most dangerous strikers of the league: Graziano Pellè and Jacob Mulenga. New Ajax signing Richairo Zivkovic already completes the top three of most dangerous strikers at 17 years of age, with both a high shots count and high shot quality.

Hakim Ziyech and Oussama Tannane have a shooting discipline problem. Both rank in the top six in shot frequency, but also in the bottom of the league with respect to shot quality.

 

In the end

This chart conveys a lot of information at a single glance, and provides even more for those patient enough to spend some more time on it. In the near future, you will find similar graphs on my twitter timeline, which I’m using more and more to pop out visuals, when I can’t find time for a full blog post or when I don’t want to repeat myself just with updated numbers. If you’re interested in this blog, you may want to pick up the visuals there too.

 

Once more, this post is inspired by the great visuals of Stephen McCarthy. Follow him!

Dreaming of competitive football

Imagine a world where football teams are truly competitive, where teams can’t buy their way out of trouble, and where it’s not the usual suspects competing for trophies year after year…

Dreams

This article envisions such a world. With most major competitions having come to an end, and the World Cup still a month away, this is my moment to dream about my ideal football world.

top_dreaming_of_soccer_tile_coasterI don’t expect this dream to become reality at all. In fact, I don’t think any single aspect of it is even on the brink of making it to FIFA’s regulatory committees. But don’t let that shy me away from inviting you to my dream world.

Here we go…

The goal in this dream world is to have as exciting football matches as possible. Excitement is hereby enhanced by competitiveness and transparency, so our world should distribute players as evenly as possible across teams and make clear how it does so, rather than have rich teams plucking talents from poor team, virtually at will, with finances largely obscured.

 

Salary cap

nba-salary-capFirst and foremost, in our world, football really needs a salary cap. Limit the amount of money teams can spend on player salaries to a certain fixed amount and teams will need to tinker with the balance of their first team squad. Everyone who has ever laid his hands on fantasy football management knows how challenging it can be to try and outsmart your rivals in trying to cramp as much talent as possible in a tight budget.

As a consequence, Messi and Ronaldo won’t see out their football lives surrounded by the best of the best. On the contrary, you can see superstars being picked up by teams of a lower standard, because those teams are the only ones able to fit their massive salaries in. Imagine a Messi-fueled Valladolid taking on Real Madrid minus a handful of their super stars…

As a consequence, Chelsea won’t load up on all offensive midfield talent of the entire world, only to farm them out and decide the fate of many more players than their first squad can potentially harbour. Choices will need to be made, which makes for interesting debate.

In our world, salaries are open, so that fans are free to discuss the merits of squad composition. How fun would it be to speculate how best to deal with the amount of money coming free next summer with players X and Y leaving, knowing which players could roughly be attracted for which sums of money.

 

Youth

Another aspect where our world differs from reality concerns youth talent. No longer do clubs train their own future players. In fact, in reality clubs hardly train their own future first teamers anyway, with most players dropping out, or ending up with other clubs.

In our world, players play for youth teams until the season they will turn 19, rather than moving across the planet as teenagers. These youth teams are completely independent institutions, unconnected to professional football clubs, but rather focussing completely on making the best of the potential talent in their ranks. Youth teams compete in a competition of their own so that fans will be aware of the next generation soon available for their clubs to recruit. Youth teams are financed with collective support by all professional teams of the nation.

 

Draft

office-stamp-draft-vector-5484Recruitment follows a draft, which ensures that poor teams get the best young talent on the market, to balance the teams as much as possible going into the next season. As an added benefit, this ensures that young talent will get maximum exposure and playing time, as poor teams will generally slot these talents right in, rather than wasting them in loans and on benches as is so common nowadays.

To get higher in the drafts, youngsters will need to showcase their talent in the youth league, which will trigger great debates among fans, scouts and other people trying to rank football players.

Oh, and the first two years these talents will stay with their draft team on a fixed and moderate income, before being open to move in the market and negotiate their own salary.

Imagine RKC battling for survival with Memphis Depay flying on the left wing, or Norwich injected with the virtues of Adnan Januzaj. I see nothing but advantages!

 

Creativity

Financial resources will always be different among teams, and now that this does not translate in a bigger wage budget, rich teams will need to be smarter than poor teams. Hire smart scouts, develop the best scouting techniques, hey maybe even make use of the best analytical tools out there! Creativity all around, only not in avoiding financial fair play this time…

 

In the end

Yes, I’ve been watching quite some basketball lately. Well spotted!

Most, if not all of these dreams are reality in basketball, which goes to show that (A) somewhere on this planet it’s possible to regulate stuff like this, and (B) it works in enhancing competitiveness!

If-I-had-asked-people-whatAnd yes, I know FIFA won’t implement any of this, but don’t let that stop us from thinking how we could improve our beautiful game. In the words of Henry Ford… If I would have asked people what they wanted, they would have said: “Faster horses”.

Sometimes we just have to think out of the box, and dream of our ideal football world.

This is mine, what is yours?

How to scout a striker?

Scouting strikers should not be that hard, right? Their prime responsibility is putting the ball in the back of the net, and goals are one of the few elements of football where traditional fans and nerdy analysts agree. A goal is a goal, counting goals cannot go wrong. Strikers who score a lot of goals are better than strikers that score less goals. Or not?

In our previous piece on scouting offensive talent, we’ve distinguished two elements that constitute a good striker.

  1. The striker has to get into good scoring positions, and accumulate good shots. This is best measured as Expected Goals (ExpG) per 90 minutes, with exclusion of penalties.
  2. The striker has to convert these chances into goals. This can be measured by comparing ExpG and actual non penalty goals.

The previous post on strikers illustrated how we can measure those two elements and judge strikers separately on both of these qualities. Today we will take it a step further and see what scouting implications come from it. We will show that sometimes it is better to buy a lower scoring striker, and which high scoring strikers to avoid. But first, I want you to meet someone.
 

Meet our striker!

He plays in a big league, for a good team, where he has taken 160 non penalty shots in the past season. On average, each shot was good for 0.152 ExpG, so over all shots together we could have expected 24.4 goals from him.

The thing is, our striker is pretty good, so instead of 24.4, he scored 43 non penalty goals for an over performance of 18.6 goals. We can stick an ugly acronym to it and say his non penalty goals above replacement (NPGAR) is 18.6.

NPGAR = Non Penalty Goals – Expected Non Penalty Goals

You’ve probably guessed by now that our striker is Lionel Messi. This season, Messi still plays for Barcelona, where he has taken 75 non penalty shots to date. On average the quality of the chances was comparable to last season, with an ExpG per shot of 0.149. Overall, we should expect 11.1 goals.

The thing is, Messi is suddenly not so excellent at finishing, and he has come up with 9 non penalty goals instead of 11. His NPGAR is now -2.14, which indicates that the average player, not even the average striker, would have scored two more goals with the type and number of shots that Messi has taken this season.

 

Analysis

A story about Messi is not analysis, it’s anecdote. And anecdotal evidence is no evidence. We could ‘prove’ that finishing does stick with a player by simply picking someone else that happened to follow an excellent finishing season with another excellent finishing season and fire that point home.

It makes more sense to repeat this work for all 479 players of the top-5 leagues who took at least 10 non penalty shots in the baseline 2012/13 season. We take separate looks at the creation of goal scoring chances (ExpG per 90) and at the conversion of chances into goals (Goals minus ExpG). Both parameters will be compared over one season and the next.

 

ExpG per 90

In the first graph we will look at the repeatability of non penalty Expected Goals per 90 minutes (ExpG NP per90). The horizontal axis shows ExpG NP per 90 for the first season, and the vertical axis shows the same for the next season.

ExpG90 correlationExcellent! It turns out that players with a high ExpG per 90 in one season, are also the players with a high ExpG per 90 in the next season. This is not too surprising, as several factors influencing ExpG per 90 will remain constant over time. Strikers will still be playing as strikers, and most players playing for top team will still be playing for top teams. More work needed here, but we’ll leave that for another post, as there is a far more interesting graph coming up.

 

NPGAR

The next graph shows the repeatability of non penalty goals above replacement (NPGAR). This represents the conversion of goal scoring chances into actual goals.

NPGAR correlationIt turns out that if you correct for the quality of goal scoring attempts, there is absolutely no connection between conversion in one season and the next. A high or low NPGAR in one season has zero relation with NPGAR in the next season.

Messi is the dot in the lower right hand corner, who had an unworldly 2012/13 season, with an NPGAR of +18.6, followed by the current season of -2.1.

 

Scouting

This is a shocking conclusion with huge implications for striker scouting. If a striker bases his goal scoring mainly on conversion, he has a good chance to fail in the next season. If a striker bases his goal scoring mainly on good underlying ExpG numbers, he has a good chance to persist his level of scoring.

Buying strikers who score their goals due to a high NPGAR is something you should always avoid.

We all know these famous examples of one season wonders, who got transferred for big money, only to disappoint at their new clubs. Usually, loads of soft factors like the higher level of competition, language issues, or playing style are used to explain the disappointing results, while the only thing going on is regression of NPGAR.

Regression does not always occur though, and you can see in the scatter plot that some players do indeed follow a season of high NPGAR with another season with high NPGAR. But just as many players do not, and just as many players with high NPGAR in the second season come off seasons with low NPGAR.

 

Finnbogason

We should use NPGAR as a red flag in striker scouting. A player like Alfred Finnbogason, currently the Eredivisie top scorer with 21 goals in 20 matches, is a nice example. We can put up several red flags.

First, 8 of his 21 goals are penalties. Second, his NPGAR is +2.68, indicating that he is nearly three non penalty goals above expectations. There is no ground at all to assume that he, or any other player, will outperform the ExpG model  next year. All in all, Finnbogason’s non penalty ExpG per 90 is 0.51, which is still a good number, but by no means near the present perception of a striker that scores 1.05 goals per 90.

For next season, 0.51 goals per 90 seems a reasonable estimate. The problem is, next season Finnbogason will not be playing at Heerenveen, as he will make the step up to a bigger league, where he won’t contribute the same number as in the Eredivisie. His true level should then be estimated somewhat lower than  0.51 goals per 90 minutes, and we will all start wondering what is going on with all these high scoring strikers who just don’t cut it outside the Eredivisie.

 

Exceptions

Inevitably, though, there will be players who seems to disprove the workings of NPGAR. We can assume that half of all players will have a positive NPGAR and half will have a negative NPGAR. A season later, one quarter of players will have two consecutive positive NPGAR seasons. One eighth will have three consecutive seasons where they outperform ExpG, and so on.

In this study among players from top-5 leagues with at least 10 shots, we find 479 players. With such a big group of players, there will inevitably be some players who consistently outperform ExpG to produce season after season of positive NPGAR. This is a misleading situation, as these players will be credited with finishing skills that are basically the product of an unrepeatable effort.

 

In the end

The message in striker scouting is quite clear. Familiarize yourself with the terms ExpG and NPGAR and these mistakes of flopping striker are generally avoidable. Stay away from strikers with high NPGAR and aim for those with high ExpG numbers, as the latter group will cut it next season, while the first group has every chance of falling back.

Probably, a negative NPGAR in a player with good underlying ExpG numbers is a sign of a bargain buy. The world will see a striker struggling to convert, and it takes some balls to buy him, but the numbers indicate that a return to scoring form is right around the corner.