Shooting style at a glance

The 2014 World Cup has been an amazing experience. It will enter history as the World Cup where Brazil collapsed in front of their home crowd, where the world fell in love with a fresh and talented Colombian side, and where three-at-the-back defenses proved that they’re back from the dead. But it was also the World Cup where the world at large tasted the use of stats in football, and seemed to like it.

 

Stat love

Over the past years, the small community of stat loving football bloggers have been cooking some nice concepts that proved tasteful to some, and at least digestable to most fans. The concept of Expected Goals is the best example, and it is now more accepted than ever. Intuitively, separating poor from good quality chances makes a lot of sense, and ExpG allows us to communicate much better than simple shot counts.

This post will aim to do just that: communicate different aspects of shooting behavior. In one plot, I hope to separate quality shooters from quantity shooters, involved shooters from uninvolved shooters, and efficient from inefficient shooters. That’s quite a lot, and it runs the risk that every data visualization carries: showing too much in one picture. Still, on this one, I’m convinced, dear reader, that you can do it.

 

The plot

So, here’s the plot I was talking about… And before going into further details, I should point you to Stephen McCarthy’s inspirational work on data visuals, which has obviously formed the inspiration for this design.

Shooting style players Eredivisie 2013-14

Nice colors, right? For a full size version, click on it.

This plot combines four elements that constitute a player’s finishing. The horizontal axis is simply the number of shots per 90 minutes played, and the vertical axis is the total amount of Expected Goals per 90 minutes. Both dotted lines represent the two standard deviations mark.

Of course, for all information in this chart, penalties are excluded. Oh, and only players playing over 30% over minutes available are included to prevent the per 90′s from being screwed.

 

Rainbow

The nice rainbow of colors represents the average ExpG per shot, ranging from very poor (red), through average (green), to excellent (purple / pink). Since ExpG per shot is the same as dividing the vertical axis by the horizontal axis, the colors are nicely arranged in the chart. Poor shot quality will prevent a player from building up ExpG, so red and orange dots will fly at the bottom of the balloons, while high shot quality helps build up ExpG quickly, and leads to the pink/purple/blue dots flying on top.

The fourth parameter is the size of the dots, where bigger dots represent more goals scored. Players with bigger dots than those around them, like Alfred Finnbogason, have converted at a more efficient (and probably unsustainable rate) than others. Reversely, players with relatively small dots, like Mulenga, Havenaar and Depay, have converted inefficiently, which, by the same line of thought, is expected not to carry over to future performances.

 

Player styles

Memphis Depay has been the absolute shot monster of the 2013/14 Eredivisie, but with his limited shot quality, he remains quite a distance behind the most dangerous strikers of the league: Graziano Pellè and Jacob Mulenga. New Ajax signing Richairo Zivkovic already completes the top three of most dangerous strikers at 17 years of age, with both a high shots count and high shot quality.

Hakim Ziyech and Oussama Tannane have a shooting discipline problem. Both rank in the top six in shot frequency, but also in the bottom of the league with respect to shot quality.

 

In the end

This chart conveys a lot of information at a single glance, and provides even more for those patient enough to spend some more time on it. In the near future, you will find similar graphs on my twitter timeline, which I’m using more and more to pop out visuals, when I can’t find time for a full blog post or when I don’t want to repeat myself just with updated numbers. If you’re interested in this blog, you may want to pick up the visuals there too.

 

Once more, this post is inspired by the great visuals of Stephen McCarthy. Follow him!

Has Holland expelled its obsession with possession football?

The Dutch national team crushed reigning World Champions Spain in an even sensational as unexpected display of brilliance. With a convincing counter attacking tactic, ‘Oranje’ ran out 5-1 winners over a demolished Spain side. Is counter attacking football the new tiki-taka?

Current national manager Louis van Gaal made his breakthrough at top level management with the Ajax side of the mid nineties. With a system based on optimal ball circulation and wide winger offense, he managed to win the current Champions League. But, like good managers should, Van Gaal always takes the actual circumstances on board in his choices. At mid nineties Ajax, possession based circulation football may have been the best choice, in different circumstances, Van Gaal makes different choices.

 

Counter attacks

In this World Cup, Holland shines in quick counter attacks, breaking into space immediately upon winning possession of the ball. This form of offense allows the qualities of the best players, Robin van Persie and Arjen Robben to shine to full effect.

With three, rather than two central defenders it seems at first glance that Holland chooses a more defensive concept, but the reverse has proven to be true. The extra central defender allows both full-backs to push forward in support of the offense. Daley Blind’s two assists against Spain are an excellent example here.

Passing network Netherlands - Spain 1 - 5 Netherlands

The above diagram shows the average position where the Dutch starting XI passed the ball from. The concept of three primary defensive players (2, 3 and 4) is clearly shown, as well as the fact that when in possession, the full-backs (5 and 7) are true wide wingers.

 

Notational clichés

All too often, formational debates are reduced to an exchange of notational clichés. The 4-2-3-1, or the 4-3-3 do not exists, and all teams apply different interpretations and different tactical preferences. And more importantly, modern teams line up vastly different when in or out of possession. In possession, we see the Dutch as a 3-4-1-2, while out of possession they take a 5-3-2 shape.

If we would reduce the description of the Dutch formation to 5-3-2 as is most commonly done in the media currently, we miss out on the whole point of the full-backs being wingers and Sneijder linking up with the offensive duo, i.e. the whole point of the 3-4-1-2. If we prefer to call them 3-4-1-2, as would be fitting with their in possession style, we should call all four men defenses a two men defense, as full-backs generally push up on the wings. Over the next days I will discuss a few more of these diagrams to show that most 4-2-3-1’s are in fact 2-4-3-1’s in possession.

 

Passing network

The width of the lines represents the number of passes that players have combined for, with a threshold of six. The crucial role of left back Daley Blind (5) in circulating the ball forward is well displayed here. Creative midfielder Wesley Sneijder (10) tends to drift to the left side of the pitch, which makes him easy to find for Blind. The role of the right full-back, Daryl Janmaat (7) is not as much in passing the ball, but more in providing offensive runs. In possession his position is as offensive as the offensive trio of Sneijder (10), Van Persie (9) and Robben (11).

 

Trend

It’s still quite early in the tournament, but Van Gaal’s choice for counter attacking football seems to fit an international trend. Teams that have dominated possession have had a tough time, or even lost their games. Brazil (61% possession) had a lot of trouble creating chances against Croatia, Mexico (62%) created less chances than Cameroon, Uruguay (56%) even lost 1-3 to Costa Rica and Spain (64%) was blown away by the Dutch counters. And this all comes at the end of a season where counter attacking teams like Real Madrid and Atlético contested the Champions League final.

 

More possession, more wins?

The relationship between possession and outcome is rather complicated in football. Generally speaking, teams that win more matches have more possession, so the correlation between possession and wins is undeniably present. However, the causal relation between possession and wins is not so straightforward. In other words, does having more possession gets your team more wins?

A clear cut answer is not (yet) available, and it seems reasonable that circumstances may dictate which answer to this question is true at which particular moment. Against Spain, the Dutch team made optimal use of the space behind the Spanish defensive line with their lightning quick counter attacks. In the match against Australia this will, in all likelihood, be quite different. In the post-match interview of the Spain match, Van Gaal already hinted at a return of the 4-3-3 system. The media may portrait him as dogmatic, in tactical terms Van Gaal’s pragmatism dominates. And that is a good thing for Dutch football.

How to define attacking style!?

Football analytics at the moment is a bit like a toddler. We think we can do quite a decent job, we’ve started talking quite loud with more variety in our vocabulary, and every now and then we start to make some sense too. Oh, and hey, we make people laugh at us at surprising occasions! Yet, most of the time, in hindsight our actions don’t make the most sense. And what we could do a year from now makes our current level of performance laughable at best.

 

Analysis

Most of my earlier analytics work has been aimed at performance analysis. Which team is better? And later on, which player does better? However attractive this edge of using stats is, in an environment as highly driven by random occurrences as football, this type of analysis approaches its limits quite soon. In plain English: football is quite hard to predict.

 

Just a level below predicting, is describing. And a recent promising development on the describing front has been introduced by fellow blogger and analyst Michael Caley. It may well be the describing part where football analytics could win over more souls to support our belief that numbers can add to a better understanding of the game.

Could you to tell me in a few words how your favorite team prefers to attack? Chances are that you’d use words like ‘direct’, ‘patient’, ‘flank play’, ‘through balls’ and ‘crosses’. Now, what Michael has come up with is a simple and easy to use stat to express two key elements of attacking play: pace and style.

 

Pace

Pace is expressed as the number of completed passes per shot taken. Just use raw numbers per team, no complicated formula’s. Here’s what we come up with for the most patient teams in Europe’s top-5 leagues plus the Eredivisie.

Passes per shot - top 10 - Multiple Leagues 05 juni 2014Some of the usual suspects, like Swansea, PSG, Arsenal, Bayern and Barcelona, make this top 10, but the most patient team in Europe are Borussia Mönchengladbach with some 37 passes per shot taken. I haven’t seen them play myself this season, but perhaps some Bundesliga fans are willing to comment here.

The other end of the spectrum will reveal teams playing lightning quick football, preferring to shoot rather than pass around.

Passes per shot - bottom 10 - Multiple Leagues 05 juni 2014That’s interesting! The top four teams are all Eredivisie teams, a league known for high scoring and high shot numbers. At some distance from the rest, relegated side N.E.C. are identified as the most direct team in Europe.

Pace is a descriptive thing, not a performance marker though. Other teams from this top 10 (Levante, Augsburg, Heerenveen) have had decent to good seasons with a very direct style of play.

 

Style

The second aspect I take from Michael is style of attack. Using two contrasting key elements of constructing offensive schemes, crosses and through balls, we can compute a simple ratio that proves to spread out nicely across different teams. Also, it fits well with the style of play we’ve familiarized ourselves with for certain teams. Here’s the top 10 in terms of the ‘crosses to through balls ratio’.

Crosses per through ball - top 10 - Multiple Leagues 05 juni 2014Four French teams in the top 6, but the EPL is also nicely represented. Manchester United’s Moyesball indeed makes the top 10 for crossing heavy offensive schemes, but to my surprise Mourinho’s Chelsea is not far off!

One thing: I’ve stripped out NAC, as they simply won’t play any through balls and their ratio is so off the chart that the other teams are dwarfed by it. In time a case study to NAC and manager Gudelj should follow.

In the bottom 10 we find the teams that prefer through balls over crosses. It seems a ratio of around 3 is as low as it gets, and with around 4 you’re still very much a through ball oriented team.

Crosses per through ball - bottom 10 - Multiple Leagues 05 juni 2014

Barcelona are the masters of avoiding crosses and poking central passes into the box. But would you have guessed Newcastle are so through ball heavy? And look at Heerenveen, showing up as a very direct teams just above, and avoiding crosses at the same time!

 

Pace and Style

Things get even more interesting when we combine both of these metrics in one chart. Teams should broadly fall into one of four categories.

-          Patient and central

o   Barcelona, Mönchengladbach, Roma, PSG, Swansea, Arsenal, Bayern, Toulouse and Ajax

 

-          Patient and wide

o   Nice, Rennes, Manchester United and Bordeaux

 

-          Direct and wide

o   Bologna, Sochaux, Lazio and Saint Etienne

 

-          Direct and central

o   Heerenveen, Newcastle, Real Madrid, Sevilla and Dortmund

 

In the end

There’s no single preferred mode of attack, and patient is not necessarily better or worse than direct. Also, central doesn’t beat wide. There are multiple ways to construct good offense and the players at hand, the philosophy of the club and the level of execution of the style if perhaps much more important.

But these concepts hand us a tool to describe pace and style, to follow trends within clubs and managerial careers. All of that with a simple tool, brought to you by the bright mind of Michael Caley.

To close off this post, here is a mega chart picturing all teams from the top 5 leagues plus the Eredivisie. Do click on it for the full, downloadable version, and you’ll see that the names above are all taken from the four corners of this chart.

Directness and Team Style - Offense version - multiple leagues

Dreaming of competitive football

Imagine a world where football teams are truly competitive, where teams can’t buy their way out of trouble, and where it’s not the usual suspects competing for trophies year after year…

Dreams

This article envisions such a world. With most major competitions having come to an end, and the World Cup still a month away, this is my moment to dream about my ideal football world.

top_dreaming_of_soccer_tile_coasterI don’t expect this dream to become reality at all. In fact, I don’t think any single aspect of it is even on the brink of making it to FIFA’s regulatory committees. But don’t let that shy me away from inviting you to my dream world.

Here we go…

The goal in this dream world is to have as exciting football matches as possible. Excitement is hereby enhanced by competitiveness and transparency, so our world should distribute players as evenly as possible across teams and make clear how it does so, rather than have rich teams plucking talents from poor team, virtually at will, with finances largely obscured.

 

Salary cap

nba-salary-capFirst and foremost, in our world, football really needs a salary cap. Limit the amount of money teams can spend on player salaries to a certain fixed amount and teams will need to tinker with the balance of their first team squad. Everyone who has ever laid his hands on fantasy football management knows how challenging it can be to try and outsmart your rivals in trying to cramp as much talent as possible in a tight budget.

As a consequence, Messi and Ronaldo won’t see out their football lives surrounded by the best of the best. On the contrary, you can see superstars being picked up by teams of a lower standard, because those teams are the only ones able to fit their massive salaries in. Imagine a Messi-fueled Valladolid taking on Real Madrid minus a handful of their super stars…

As a consequence, Chelsea won’t load up on all offensive midfield talent of the entire world, only to farm them out and decide the fate of many more players than their first squad can potentially harbour. Choices will need to be made, which makes for interesting debate.

In our world, salaries are open, so that fans are free to discuss the merits of squad composition. How fun would it be to speculate how best to deal with the amount of money coming free next summer with players X and Y leaving, knowing which players could roughly be attracted for which sums of money.

 

Youth

Another aspect where our world differs from reality concerns youth talent. No longer do clubs train their own future players. In fact, in reality clubs hardly train their own future first teamers anyway, with most players dropping out, or ending up with other clubs.

In our world, players play for youth teams until the season they will turn 19, rather than moving across the planet as teenagers. These youth teams are completely independent institutions, unconnected to professional football clubs, but rather focussing completely on making the best of the potential talent in their ranks. Youth teams compete in a competition of their own so that fans will be aware of the next generation soon available for their clubs to recruit. Youth teams are financed with collective support by all professional teams of the nation.

 

Draft

office-stamp-draft-vector-5484Recruitment follows a draft, which ensures that poor teams get the best young talent on the market, to balance the teams as much as possible going into the next season. As an added benefit, this ensures that young talent will get maximum exposure and playing time, as poor teams will generally slot these talents right in, rather than wasting them in loans and on benches as is so common nowadays.

To get higher in the drafts, youngsters will need to showcase their talent in the youth league, which will trigger great debates among fans, scouts and other people trying to rank football players.

Oh, and the first two years these talents will stay with their draft team on a fixed and moderate income, before being open to move in the market and negotiate their own salary.

Imagine RKC battling for survival with Memphis Depay flying on the left wing, or Norwich injected with the virtues of Adnan Januzaj. I see nothing but advantages!

 

Creativity

Financial resources will always be different among teams, and now that this does not translate in a bigger wage budget, rich teams will need to be smarter than poor teams. Hire smart scouts, develop the best scouting techniques, hey maybe even make use of the best analytical tools out there! Creativity all around, only not in avoiding financial fair play this time…

 

In the end

Yes, I’ve been watching quite some basketball lately. Well spotted!

Most, if not all of these dreams are reality in basketball, which goes to show that (A) somewhere on this planet it’s possible to regulate stuff like this, and (B) it works in enhancing competitiveness!

If-I-had-asked-people-whatAnd yes, I know FIFA won’t implement any of this, but don’t let that stop us from thinking how we could improve our beautiful game. In the words of Henry Ford… If I would have asked people what they wanted, they would have said: “Faster horses”.

Sometimes we just have to think out of the box, and dream of our ideal football world.

This is mine, what is yours?

Radar Love – Capturing Players in a Single Picture

Comparing football clubs is one thing, comparing football players is yet another. It lies at the heart of many pub debates, where passionate fans try to convince each other that their beloved star is better, often to settle the subtle disagreement by concluding that the players are different. And indeed, different positions, skills and tactical roles make it hard to rank individual players. The first step should be to picture them correctly, and that is where this post will step in.

A lot has been written recently about football analytics and the use of numbers in the beautiful game in general. Some claim it’s an enrichment, some claim it ruins the magic of the game. I don’t see it as such a clear separation. Whether we want it or not, stats are there.

It’s up to each of us to decide for himself how much of it we prefer to add to our football match experience.  And if the analytics community sees anything as its task to lower the threshold for people to start using stats, it should be making stats more accessible. I’m fairly confident that the addition of radar plots will do just that.

 

Giants

‘Standing on the shoulders of giants’ is an apt way to put what I’m doing right here. The conception of many excellent analytical and visualization ideas lies outside football, and radar plots in sports started with basketball, where they appeared in 2009. A few weeks ago, it was @StatsBomb’s own Ted Knutson who introduced them in football. Unsurprisingly, they were quite well received for the many advantages they have.

I’ve given my own twist to the radar plots and I should perhaps mention that the design of these plots is very much a work in progress. Along the way we may decide some elements are missing and others should better be omitted from the chart. For a start, here it is. Click on it for a full-size version.

Radar chart - Dusan Tadic vs Lucas Piazon Eredivisie 2013-14Which better players to give the honor of the first radar plot on 11tegen11 than the two most creative attacking midfielders of the Eredivisie, Twente’s Dusan Tadic and Vitesse’s (or actually Chelsea’s) Lucas Piazón.

My version of the radar plot has nine axes and I’ve spent a considerable amount of time thinking about which parameters to include, as well as how to order the axes. All parameters are presented as per 90 minutes. The decision not to present any actual numbers is a conscious one, as I felt it would distract from the goal of the plot, which is to compare players. If you wish to see the underlying numbers, I’m fairly sure you’ll be able to find them within minutes. The scales of the axes represent the minimum and maximum values found in the league.

Let’s go over the axes one by one.

 

Passes

On top, on the twelve o’clock position is the amount of passes per match. Players who are more involved score higher. I have not yet corrected for total team passes, as I’m unsure whether it provides a true benefit, and what would be the best way to correct for it. Feel free to voice out, as with this concept, there’s no best design yet.

The passing axis is placed in between ‘Incomplete Passes’ and ‘Expected Goals’. The order of the axes is very important, since they determine the surface created for the player. A high score on two, or even three axes leads to a significant area within the plot, creating the image of a high quality player. In this case, it’s combining lots of passes with a low incomplete passes count and a high ExpG.

Usually, more passes contribute to more incomplete passes, and more passes are the domain of players playing further away from the opposing goal. This should provide a balance that gives our radar plot value.

 

Incomplete passes

We make a counter clockwise trip to the ‘IP’ axis, that stands for ‘Incomplete Passes’. As with all negative traits, this axis is inverted, so that a better performances, in this case less incomplete passes, gives a bigger area on the plot.

Incomplete Passes is flanked by ‘Passes’ and ‘Interceptions’. This should be the area where defensive midfielders excel. Each position in the field should have an area where they can express themselves, otherwise certain positions on the pitch will be underestimated by the plot.

 

Interceptions

Interceptions are presented as ‘per 400 opposition passes’, as I’ve found raw interceptions per 90 minutes to give too much bias towards players on poor and defensive sides. This correction allows for players on ball possession teams to have a fair shot.

It’s flanked by ‘Incomplete Passes’ and ‘Dribbled by’. The latter represents how often the player is getting dribbled by, which you’d definitely not want for a defensive player. This allows good defensive players a nice area where tidy passers, with intercepting qualities that stand their ground will shine.

 

Dribbled by

Dribbled by is another inverted axis, as it’s considered better to have less of it. It is flanked by ‘Interceptions’ and ‘Tackles’ and this lower left side is the defensive player’s domain. Expect central defenders and defensive midfielders in this zone.

 

Tackles

This is pretty self explanatory really, other than the fact that, like ‘Interceptions’ I like to express it as per 400 opposition passes. It is flanked by ‘Dribbled by’ and ‘Fouls’, since these are two stats you would not like a defending player to have. The fouls axes should provide another balancing act for players making more tackles.

 

Fouls

Another self explanatory, and inverted axis. Less fouls, bigger area. It is flanked by ‘Tackles’ and ‘Dribbles’, as I felt is makes the best switch to the offensive player’s side of the chart.

 

Dribbles

Dribbles gets its own axis so that offensive players get enough room to shine. Also, I think it’s an under appreciated domain in stat use in general, where players add a dimension of unpredictability to the team. A good dribbler provides a threat that influences the style of defense of the opposition. It is flanked by ‘Fouls’ and ‘Expected Assists’.

Particularly the link with ExpA is a valuable one, since it allows wide players to express themselves in this lower right hand part of the chart.

 

ExpA

On a team basis, this may be one of the most important axes, yet on an individual player basis it should just be one of nine. Expected Assists represents the passes leading to a goal scoring attempt, where each of those attempts is weighed according to the odds to score from it.

ExpA is flanked by ‘Dribbles’ to give attention to players that should be hard to defend against: those players with enough skill to dribble and to deliver the final ball. Also, it is flanked by ‘Expected Goals’, to allow players with multiple offensive dimensions to claim a bigger part of the chart.

 

ExpG

Expected Goals is the final axes of our circle. There’s hardly a need for explaining this terms anymore. Suffice to say it represents all goal scoring chances a player takes, which are weighted according to the odds to score from it.

ExpG is flanked by ‘Expected Assists’ and ‘Passes’. The latter connection is very powerful and opposition would never want a goal scorer to see a lot of the ball, so those goal scorers that do just that should be rewarded with a bigger piece of the chart.

 

In the end

This concludes our trip around the chart. In my view, it provides a fair balance between different elements of the game, and the ordering of the axis makes it difficult to claim a lot of ‘area’ without having serious underlying qualities. This balancing act also ensures that I will use the same chart layout for all players, so that learning to use them is as straightforward as possible.

Some of you may notice that traditional metrics like ‘Goals’ and ‘Assists’ are missing. My recent work on the ‘unrepeatability’ of scoring once the quality of the goal scoring attempt has been corrected for, leads me to believe that both ‘Goals’ and ‘Assists’ are inferior to ‘Expected Goals’ and ‘Expected Assists’. Scoring or assisting without the underlying ExpG or ExpA won’t last, so why credit a player for doing it. Or, to use a quote that is mostly linked to Jonathan Wilson, the writer who inspired me to football blogging in the first place, “goals are overrated.

I’ll leave you with some bonus charts.

Radar chart - Daley Blind vs Felipe Gutierrez Eredivisie 2013-14The two best defensive midfielders of the Eredivisie! You can see Blind gets the nod in the defensive department of tackles, dribbled by and interceptions. Gutierrez is a bit more tidy in his passing, but that’s probably related to making less passes overall. Blind is more of an assisting threat, while Gutierrez gets a tiny advantage in terms of goal scoring.

Radar chart - Jeffrey Bruma vs Joel Veltman Eredivisie 2013-14Two young Dutch center backs. Ajax’ Joël Veltman does better on nearly every single axis compared to PSV’s Jeffrey Bruma. It’s Veltman’s passing accuracy that could be improved on.

Radar chart - Memphis Depay vs Viktor Fischer Eredivisie 2013-14Another Ajax v PSV meeting, with players playing in the same left wing position, but in very different interpretations of that role. Depay is much better in assisting and scoring, whereas Fisher gets the nod in dribbling, passing tidiness and interceptions. Both players add a significant amount of ‘area inside the plot’ with their dribbling skill, which is why I put this chart up. I feel it’s important to recognize that element of the game.

Radar chart - Graziano Pelle vs Luc Castaignos Eredivisie 2013-14Two players who are more similar that I would have expected. Both Pellè and Castaignos do little else both contributing to ExpG and ExpA, where the Feyenoord striker puts in an unreal amount of goal scoring threat. He touches the border of the chart, so no player beats him in the ExpG category.

How to scout a striker?

Scouting strikers should not be that hard, right? Their prime responsibility is putting the ball in the back of the net, and goals are one of the few elements of football where traditional fans and nerdy analysts agree. A goal is a goal, counting goals cannot go wrong. Strikers who score a lot of goals are better than strikers that score less goals. Or not?

In our previous piece on scouting offensive talent, we’ve distinguished two elements that constitute a good striker.

  1. The striker has to get into good scoring positions, and accumulate good shots. This is best measured as Expected Goals (ExpG) per 90 minutes, with exclusion of penalties.
  2. The striker has to convert these chances into goals. This can be measured by comparing ExpG and actual non penalty goals.

The previous post on strikers illustrated how we can measure those two elements and judge strikers separately on both of these qualities. Today we will take it a step further and see what scouting implications come from it. We will show that sometimes it is better to buy a lower scoring striker, and which high scoring strikers to avoid. But first, I want you to meet someone.
 

Meet our striker!

He plays in a big league, for a good team, where he has taken 160 non penalty shots in the past season. On average, each shot was good for 0.152 ExpG, so over all shots together we could have expected 24.4 goals from him.

The thing is, our striker is pretty good, so instead of 24.4, he scored 43 non penalty goals for an over performance of 18.6 goals. We can stick an ugly acronym to it and say his non penalty goals above replacement (NPGAR) is 18.6.

NPGAR = Non Penalty Goals – Expected Non Penalty Goals

You’ve probably guessed by now that our striker is Lionel Messi. This season, Messi still plays for Barcelona, where he has taken 75 non penalty shots to date. On average the quality of the chances was comparable to last season, with an ExpG per shot of 0.149. Overall, we should expect 11.1 goals.

The thing is, Messi is suddenly not so excellent at finishing, and he has come up with 9 non penalty goals instead of 11. His NPGAR is now -2.14, which indicates that the average player, not even the average striker, would have scored two more goals with the type and number of shots that Messi has taken this season.

 

Analysis

A story about Messi is not analysis, it’s anecdote. And anecdotal evidence is no evidence. We could ‘prove’ that finishing does stick with a player by simply picking someone else that happened to follow an excellent finishing season with another excellent finishing season and fire that point home.

It makes more sense to repeat this work for all 479 players of the top-5 leagues who took at least 10 non penalty shots in the baseline 2012/13 season. We take separate looks at the creation of goal scoring chances (ExpG per 90) and at the conversion of chances into goals (Goals minus ExpG). Both parameters will be compared over one season and the next.

 

ExpG per 90

In the first graph we will look at the repeatability of non penalty Expected Goals per 90 minutes (ExpG NP per90). The horizontal axis shows ExpG NP per 90 for the first season, and the vertical axis shows the same for the next season.

ExpG90 correlationExcellent! It turns out that players with a high ExpG per 90 in one season, are also the players with a high ExpG per 90 in the next season. This is not too surprising, as several factors influencing ExpG per 90 will remain constant over time. Strikers will still be playing as strikers, and most players playing for top team will still be playing for top teams. More work needed here, but we’ll leave that for another post, as there is a far more interesting graph coming up.

 

NPGAR

The next graph shows the repeatability of non penalty goals above replacement (NPGAR). This represents the conversion of goal scoring chances into actual goals.

NPGAR correlationIt turns out that if you correct for the quality of goal scoring attempts, there is absolutely no connection between conversion in one season and the next. A high or low NPGAR in one season has zero relation with NPGAR in the next season.

Messi is the dot in the lower right hand corner, who had an unworldly 2012/13 season, with an NPGAR of +18.6, followed by the current season of -2.1.

 

Scouting

This is a shocking conclusion with huge implications for striker scouting. If a striker bases his goal scoring mainly on conversion, he has a good chance to fail in the next season. If a striker bases his goal scoring mainly on good underlying ExpG numbers, he has a good chance to persist his level of scoring.

Buying strikers who score their goals due to a high NPGAR is something you should always avoid.

We all know these famous examples of one season wonders, who got transferred for big money, only to disappoint at their new clubs. Usually, loads of soft factors like the higher level of competition, language issues, or playing style are used to explain the disappointing results, while the only thing going on is regression of NPGAR.

Regression does not always occur though, and you can see in the scatter plot that some players do indeed follow a season of high NPGAR with another season with high NPGAR. But just as many players do not, and just as many players with high NPGAR in the second season come off seasons with low NPGAR.

 

Finnbogason

We should use NPGAR as a red flag in striker scouting. A player like Alfred Finnbogason, currently the Eredivisie top scorer with 21 goals in 20 matches, is a nice example. We can put up several red flags.

First, 8 of his 21 goals are penalties. Second, his NPGAR is +2.68, indicating that he is nearly three non penalty goals above expectations. There is no ground at all to assume that he, or any other player, will outperform the ExpG model  next year. All in all, Finnbogason’s non penalty ExpG per 90 is 0.51, which is still a good number, but by no means near the present perception of a striker that scores 1.05 goals per 90.

For next season, 0.51 goals per 90 seems a reasonable estimate. The problem is, next season Finnbogason will not be playing at Heerenveen, as he will make the step up to a bigger league, where he won’t contribute the same number as in the Eredivisie. His true level should then be estimated somewhat lower than  0.51 goals per 90 minutes, and we will all start wondering what is going on with all these high scoring strikers who just don’t cut it outside the Eredivisie.

 

Exceptions

Inevitably, though, there will be players who seems to disprove the workings of NPGAR. We can assume that half of all players will have a positive NPGAR and half will have a negative NPGAR. A season later, one quarter of players will have two consecutive positive NPGAR seasons. One eighth will have three consecutive seasons where they outperform ExpG, and so on.

In this study among players from top-5 leagues with at least 10 shots, we find 479 players. With such a big group of players, there will inevitably be some players who consistently outperform ExpG to produce season after season of positive NPGAR. This is a misleading situation, as these players will be credited with finishing skills that are basically the product of an unrepeatable effort.

 

In the end

The message in striker scouting is quite clear. Familiarize yourself with the terms ExpG and NPGAR and these mistakes of flopping striker are generally avoidable. Stay away from strikers with high NPGAR and aim for those with high ExpG numbers, as the latter group will cut it next season, while the first group has every chance of falling back.

Probably, a negative NPGAR in a player with good underlying ExpG numbers is a sign of a bargain buy. The world will see a striker struggling to convert, and it takes some balls to buy him, but the numbers indicate that a return to scoring form is right around the corner.

Putting Expected Goals to the test

After yesterday’s post where Expected Goals was explained in detail, today’s post will put the metric to the test. How good is Expected Goals? And is it better than Total Shots Rate?

We’ll compare ExpG and TSR at several levels as we go along. The dataset used for the first part of this analysis consists of all 98 teams from the 2013/14 season so far, for top-5 leagues. As usual, data comes from Squawka, my go-to-site for OPTA driven football data. All comparisons in this piece are made on team level. We’ll leave the individual player analysis of ExpG for another day.

ExpG is calculated as explained in yesterday’s post, and for comparison with TSR, ExpG ratios (ExpGR) are used. For all behind-the-scenes input in the ExpG formula no data from the 2013/14 season was used. All regression analysis that was needed to determine how to rate different factors that influence ExpG was carried out on earlier data. The risk of over fitting is therefore minimized.

ExpGR = ExpG for / (ExpG for + ExpG against)

TSR = Shots for / (Shots for + Shots against)

 

TSR and outcome

First up, the relation between TSR and the outcome in terms of points per game (PPG) and goal difference (GD). Click on the graph if needed, for a larger version.

TSR and outcomeTSR is a very good metric. It correlates nicely with the most relevant two performance indicators PPG and GD. The R-squared values of 0.55 and 0.58 indicate that knowing a team’s TSR provides around 75% of knowledge needed for a perfect knowledge of either PPG or GD. For more, and better explanations of R-squared and R, check Phil Birnbaum. The man really knows his stuff.

In general, R-squared values are higher when leagues have a clear separation into two groups. EPL typically has values over 0.6, while Ligue 1, where the dots are one bunch, generally scores below 0.4.

 

ExpG and outcome

These two plots show the relation between ExpGR and outcome.

ExpGR and outcomeFrom face value alone, you can tell that ExpGR has a better correlation with outcome than TSR has. The dots are closer to the red regression line, so the R-squared value is a lot higher. For PPG, the R-squared is 0.73, while for GD it is somewhat higher at 0.79.

This is a magnificent correlation between a metric and outcome, but don’t get carried away yet. We would expect ExpGR to do better here, as it carries more detailed information to rate goal scoring chances. The formula behind it is designed to improve the relation with outcome in terms of PPG and GD. It would be a true shock if ExpG did not do a lot better than TSR here. What’s more important is the second half of this piece, looking at repeatability of the metrics.

 

TSR and repeatability

From here on, a different data set is used, as we’ll now compare the same metric over two consecutive seasons. Data consists of season 2012/13 and 2013/14 so far for the top-5 leagues, where obviously relegated sides from the first season did not produce a second season for comparison, as promoted sides in the second season did not have a first season to compare with. This left 84 teams with consecutive seasons.

TSR repeatabilityTSR is pretty repeatable, producing an R-square of 0.51. This indicates that TSR in the first season is a moderately good predictor of TSR in the second season. Most teams are roughly in the same ballpark, but deviations of 0.100 are far from rare.

 

ExpGR and repeatability

The next plot shows ExpGR in the first and second season.

ExpGR repeatability

ExpGR has an even better repeatability than TSR did With an R-squared of 0.67 this metric carries a good signal over multiple seasons. Stripping a few outliers, teams generally don’t deviate more than 0.050.

 

In the end

This scatter plot heavy piece proves a superior correlation for ExpGR with both outcome and repeatability compared to TSR. To speak with Nate Silver, ExpGR carries more signal and less noise than TSR.

The first part of this post, relating ExpGR and outcome, shows that in measuring team performance, ExpGR show prevail over TSR. This conclusion was probably known intuitively, but is now illustrated and quantified.

The second part of this post is more revolutionary, as it establishes ExpGR as a more reliable parameters to use for predictions. This means not just fancy number heavy predictive models, but also any easy made claims regarding upcoming matches or final league positions.

TSR still holds the quite relevant advantage that counting shots is a lot easier than building an ExpG model. However, with more and more variations of ExpG models around, these numbers will gradually become easier to obtain over time.

 

 

I feel like I could have put a dozen links to James Grayson’s amazing site in this TSR heavy post, but I’d rather urge you to just go to his site and check it thoroughly. It is good.

What is ExpG?

This post will look at the latest love child of the football analytics community, Expected Goals, commonly referred to as ExpG or xG. I’ve noticed a lot of questions via Twitter recently, regarding this relatively new concept. Spread across multiple posts, the concept is mentioned and has been explained on 11tegen11 before, but I felt the need for a comprehensive explanatory piece on ExpG to explain this important concept, and to use it for future reference.

 

ExpG

ExpG stands for Expected Goals. It measures not how many goals a team has scored, but how many goals an average team would have scored with the amount and quality of shots created.

Each goal scoring attempt is assigned a number based on the chance that this attempt produces a goal. Typical parameters to use are shot location and shot type (shot vs header). Some models, including the one I use on 11tegen11, also use assist information to separate through-balls from crosses.

Teams that produce more ExpG than they concede have the best chances of winning football matches.

 

Total Shots Rate

ExpG has its roots in another key metric in football, Total Shots Rate, or TSR. Before trying to grasp ExpG, it is important to get familiar with shots rates.

Total Shots Rate = Shots For / (Shots For + Shots Against)

This formula provides TSR on a 0 to 1 scale. If a team takes all shots in a match, or a series of matches, TSR will be 1, and the more shots it has to leave to opponents, the lower TSR gets. On average, over multiple teams in the same league, TSR will always be 0.500, since each shot for is a shot against for another team.

TSR is pretty simple, yet it is a powerful predictor for future performance of football teams. Ever since its introduction to football, by James Grayson, TSR has dominated the analytics community. James has shown TSR to have the two qualities that are essential for a powerful team ranking tool.

  1. TSR shows a strong correlation with both points per game, and goal difference.
  2. TSR in one time period shows a strong correlation with TSR in the next time period.

If only the first condition is met, the metric would be strong in telling what has happened, but does not translate into the future. Goal keeper saves percentage is a nice example of a stat that helps explaining what has happened, but holds no power for matches still to come.

If only the second condition is met, the metric would be strong in translating into the future, but not correlated to performance. Team shirt color is a nice example, where translation into the future is easy, but a relation to performance does not exist.

 

The problem with TSR

The problem with TSR is that it treats all shots equal, which does not fit the fluency of football, where shots are not equal. Shots may come through a crowd of defenders from 40 yards out, or from the penalty spot in optimal circumstances. For TSR, both shots count as one, and both influence TSR equally.

This induces errors and probably also bias.

Errors arise because some shots are worth more than others. Sometimes a team creating 20 shots did a powerful job, but other days the team was just trigger happy and produced weak quality output. It may sound weird, but errors are not too much of a problem in a predictive model.

Bias is much worse.

If all teams produce and concede an equal case mix of poor and high quality shots, TSR would, despite its errors, be a perfect tool. However, there is plenty of evidence around that this is not the case. Some teams produce high quality shots, like Barcelona, and other teams produces low quality shots, like Laudrup’s Swansea.

 

Shot quality

Shot quality definitely meets condition one. It is related to performance in terms of points per game and goal difference. However, the clear cut evidence that it meets condition two is less clear. Data to measure shot quality is around since the 2012/13 season, so we don’t have high quality season-to-season correlation measurements. In other words, was Swansea’s recent struggle to produce decent shot quality just a flurry that would fix itself, or does it indicate an underlying reason that will cause the team to produce below average quality shots in the near future?

 

In the end

ExpG is hot, and if you’d ask me now, I’d say ExpG is the next big step that is being taken now in football analytics. Intuitively it makes a lot of sense to separate goal scoring attempts by the odds of scoring from it. However, for a new metric to be adopted for truth, a bit more work is needed. ExpG is a lot more complex than just counting shots. To show that this effort is worthwhile, we should first do a better job to illustrate its supremacy over TSR.

Never judge a goal keeper by his saves

Sometimes analysis and football intuition fit nicely together, and in those situations writing analysis pieces is easy. Sometimes they don’t, and writing gets tougher.

I’ve been thinking for most of the past weeks on goal keeper analysis. A topic that seems as simple as it gets, but as we’ll find out in this post, is actually a difficult one to get your head around and do properly.

GK saveAccording to the all-knowing Wikipedia, a goal keeper is “a designated player charged with directly preventing the opposing team from scoring by intercepting shots at goal”.

So, what could be more difficult than assessing how many of those shots end up as goals and, voilà, here’s our goal keeper analysis?

Let me start with a poll question. No need to fill out the answer, just take a little bit of time to make up which answer you think is correct.

The best way to identify goal keeping talent is…

  1. Percentage of shots saved
  2. Percentage of shots on target saved
  3. Percentage of shots saved with a correction for shot quality
  4. Other

 

In my personal history in football analysis I’ve gone from A to B, back to A, to C.

At C I’ve spent most of this season, but some background work I’ve done these past weeks have moved me further down, to D.

Yes, in my view, goal keeper analysis cannot reasonably be done on the basis of analyzing saves.

Now, that statement requires a bit of back up, so here we go. In the remaining part of this article we’ll analyze goal keepers in the top-5 leagues (England, Spain, Italy, Germany and France), who have faced at least 100 shots in two consecutive seasons (2012-13 and 2013-14), with the same club. To my idea, this is the best sample to use, to prevent keepers switching teams from screwing up the sample, and to prevent keepers with low numbers from doing the same.


Percentage of shots saved

We’ll start with raw saves percentage. This is the easiest parameter to collect, and probably the most used tool to evaluate goal keepers. It also ties in nicely with our intuition that good goal keepers stop a higher proportion of shots than bad goal keepers.

GK save percentage 03 februari 2014

The horizontal axis shows save percentage in the first year, and the vertical axis shows save percentage in the subsequent year. Remember, these are all goal keepers playing two seasons for the same club.

The connection is not very strong, but it’s not totally absent either. Generally, goal keepers who noted good saves percentages in the first year, noted better saves percentages in the second year, but the spread is huge. This makes it unreliable to estimate the second year’s saves percentages on the basis of the first year’s saves percentages. The repeatability of goal keepers saves percentage is poor. In general, if your stat has a poor repeatability, it’s useful to describe what has happened, but very misleading to assume that things will happen along the same lines in the future.

These numbers correspond with the excellent and far under viewed work by James Grayson, who found a similarly poor relation in a much larger set, matching teams in one season and the next.

 

Percentage of shots on target saved

Let’s move a little step forward and isolate shots on target. Some people advocate to use this over raw saves percentage, since goal keepers are hardly responsible for off target shots. In theory, though, keepers may take responsibility for some off target shots. By approaching a striker they could disrupt shot placement, or by reputation alone they could force strikers to try and find more difficult corners of the goal. Just raising a few hypotheses here.

GK save percentage SoT 03 februari 2014Again, first year performance is plotted on the horizontal axis, with second year performance on the vertical axis. The connection is even weaker for saves percentage of on target shots than it is for saves percentage of all shots conceded. Let’s save the debate until after the next plot.

 

Percentage of shots saved with a correction for shot quality

The third analysis uses shot quality. Based on our Expected Goals (ExpG) model, each shot is assigned a chance of ending up in goal, based on shot location, shot type and several other factors. This helps to control for the difficulty goal keepers have to make the save. In theory, this analysis is the best test for shot stopping quality, since it removes the fact that some keepers face tougher shots than others.

Goals conceded above replacement identifies how many goals a keeper conceded above or below the value of Expected Goals per 100 shots faced.

GK CAR 03 februari 2014After correcting for shot quality, all connection between first year performance and second year performance is lost. A goal keeper who over performed in the first year, has an equal chance of over performing in the second year as a goal keeper who under performed in the first year.

The most intriguing part of this rather shocking conclusion is that this knowledge is already out there, yet people continue to analyze goal keepers on the basis of saves. Again, I’m pointing you towards James Grayson, who, with smaller numbers taken from a Paul Riley post, found no correlation between goal keeper saves percentages in one season and the next after correction for shot location.

 

Shot quality

Please allow me to add one more scatter plot. This time, I’ve linked saves percentage and ExpG per shot, to show the strong link between those two.

GK save percentage and ExpG 03 februari 2014No goal keepers that faced shots higher than 0.11 ExpG noted a saves percentage over 92%, and no goal keepers that faced shots lower than 0.09 ExpG noted a saves percentage below 90%.

 

In the end

Putting all four plots together, this is compelling evidence to ignore each and every analysis using goal keeper saves percentage. The only, weak, link between goal keeper saves percentages (first graph) is driven by the quality of shots allowed. Some teams tend to face higher quality shots than others, therefore some goal keepers tend to have higher saves percentages than others. Nothing more, nothing less. On top of that, there’s going to be a huge amount of variance in performance.

This does not mean that shot stopping is not a skill. It most definitely is. It just indicates that among all factors that dictate a goal keeper’s saves percentage, the spread of skill level in shot stopping among top level goal keepers is very close. Other factors that influence goal keeper saves percentage completely overshadow the effect of skill, most notably shot quality, as indicated by ExpG.

 

Goal keeping talent

GK save 2So, how to scout for goal keeping talent? Start by ignoring saves percentage and you’ll leave most of the scouting world behind. Scouts will be aiming at goal keepers who’ve had random high saves percentages in some season, but those goal keepers stand an equal chance next season compared to all other goal keepers. Goal keepers who’ve had the bad luck of noting a low saves percentage season will probably be undervalued by the market.

What signs to look for, if not saves percentage? This piece shows compelling evidence against saves percentages, but it does not say that all goal keepers are equal. Far from that. It may well be that better goal keepers face less shots or shots with a lower ExpG. Better goal keepers will give up less rebound chances, less corners, claim more crosses, distribute balls better or sweep up nicely behind the defense.

All this stuff can be counted, but it’ll be hard to separate it from the effort of defenders. We’ll get to that in time. In the meantime, don’t let yourself be fooled by saves percentages.

Predictions for the English Premier League – A midweek title shift

This will be a rather short post where I’ll run the numbers for my league prediction model again. Most of the workings behind the model are explained in detail in the introductory post, back when the model still held Arsenal in marginally higher regard than City. Oh, wait, that was actually only just over two weeks ago.

 

Bias

“How can someone reasonably have thought that Arsenal was going to win the title? I just knew it was always going to be City. Any decent football watcher could see that. All those models are just crap” (anonymous fictional reply)

Eeuhm, no. This is probably the most frustrating part about going public with predictions in football. You will always be wrong at some point. It’s just the unpredictable nature of the sport. And I could take knowledge of the past two weeks out of the data, re-run the model and confirm that, based on all information at that very point, the model rated Arsenal and City very close. I can’t do that with any human mind.

It’s a form of bias that influences our memory, so that we think we’ve always rated City higher than Arsenal. But if results would have taken another turn, we may just have focused more on that brilliant Özil stuff and Giroud finally picking up on his finishing. Once again, we would have confirm what “we [would] have already known for a long time”.

 

Public

This is exactly the reason why I like to go public with these models from time to time. Let me just put the results of the model out there and see what happens. How do the odds shift upon certain events. In hindsight, we can talk openly about when decisive trends were picked up and why certain teams were over or underrated. That way we can learn, I can learn, and next year, the model will have learned. But if you think ‘I knew it all along’, please just put it out there before events take place and we’ll see. The more models and estimations out there, the more we all learn.

 

Predictions

So, with this ramble over and done with, here we go with the predictions for the league table. The format may start to look familiar now. Boxes correspond to a spread of 50% of the outcomes of simulations around the mean, indicated by the think vertical black line. The other edges mark the 95% interval and dots are true outliers.

The outliers teach us that in extremely unlucky cases a team like Liverpool may even finish below 60 points (they have 46 already, they have Suarez and they have 16 matches left to play), with the same underlying performance they show now. Guess we’ll have a hard time convincing the conservative and trigger happy football world to accept just that, don’t we?

Boxplot projected league table English Premier League 2013-14 30 januari 2014Unsurprisingly, City lead the way after their crushing of Tottenham last night. The model has City finishing around 83 to 84 points, with a margin of just over four points to Arsenal. Both Arsenal and Chelsea have cooled off a bit, after their draws. In all likelihood, Liverpool will finish no lower than fourth and the reds may still hope for more.

 

No battle

Spurs are quite unaffected by the loss, since they had quite a margin to Everton, who lost to Liverpool, and to United, who had still some ground to make up from the start of the season. I’m sorry to disappoint the crowd of football journalists, but the battle for top-4 is just not happening. No team that is presently outside the top 4 holds more than 10% chance of finishing inside that top 4.

Everton and United are by now quite equal, and both have about a one in five chance of making the Europa League. Newcastle, Southampton, Villa and West Brom should probably already be thinking about next season.

 

Relegation

The relegation battle has seen some interesting developments. The most important match was of course Sunderland’s narrow home win over Stoke, which sees the Black Cats reduce their odds to below 50%. Things look pretty dreadful for mr. Tan and mr. Solksjaer, who hold the bottom spot and the model thinks quite firmly that they will go down.

Fulham’s underlying numbers are quite terrible and this fuels the model to give them a 4 out of five chance of relegation. I’m talking most shots and ExpG conceded and 17th in shots and ExpG for, while most of their better production came in the stints against Palace and Villa when they were already two goals up.

I do realize, however, that both teams have new managers, and it’d be interesting to see if this will correspond to a shift in underlying numbers. Obviously, the model will need a bit of time to pick that up, as it also did with Palace under Pulis. But in all honesty, “the firing of the manager has to be explained in relation to other reasons rather than for the expected improvement in team performance”.

Boxplot projected League positions English Premier League 2013-14 30 januari 2014