Why I don’t board the PSV bandwagon just yet

On a first glance, things are looking all rosy in Eindhoven. Going into the season, PSV were hoping to challenge Ajax for the title, but four matches into the season it’s been all good news for PSV. They beat Ajax away and hold a six points lead already. Still, I would be hesitant to board the PSV bandwagon just yet.

PSVYes, PSV won four out of four to equal their best league start in 11 years.

Yes, PSV managed to hang on to arguably two of the league’s best players in both Memphis Depay and Georginio Wijnaldum.

Yes, PSV strengthened their squad by re-signing central defender Karim Rekik on a one-year loan from Manchester City, and experienced Mexican international Andres Guardado.

There must be some very compelling arguments not to board the PSV bandwagon right now and start declaring them red hot favourites for the 2014/15 Eredivisie title.

 

Scoreboard journalism

Co Adriaanse

Well, the main argument is called scoreboard journalism. Back in 2003, this termed was coined by then AZ manager and now prominent TV pundit, Co Adriaanse. He pointed out that, although his team had just lost 5-1 to Roda JC, the play had been quite good, and the pundits judged outcome over process.

In reverse, the same holds true for PSV so far this season. With twelve points from four matches, the outcome has been perfect, yet the process is worrying to say the least. It’s probably easier to fit narratives to PSV’s perfect start, than it is to dive into the underlying numbers and write about the process at hand.

And even if you are smart, but you see your job as filling newspaper space or talk show time, talking PSV up now ensures new stories to write once the current bubble will inevitably burst. There will probably be a player missing through injury, or post Europa League matches, may be even early kick off times to blame. There will be new narratives to fit, new stories to write, everybody happy.

But here at 11tegen11 we don’t have to worry about narratives, and we’re free to take a dive into our beloved stats for a more nuanced opinion.

 

Shots

PSV has played four matches, scored 14 goals, and conceded three. In those four matches, PSV didn’t win the shot count once. Not in their season opener at promoted side Willem II, not away at Ajax when they put a dent in their rival’s early season, not last week beating Vitesse 2-0 at home, and not even in their 6-1 thumping of NAC Breda.

In each of those matches, PSV produces less shots than their opponents. Now, some people would be convinced that this is a good thing. ‘Winning the matches where you play poorly is a sign of champions.’ There’s a lot to say about that statement, but losing the shot count four out of four times is always a bad sign. Shot counts are very well correlated with end of season points, even this early in the season.

 

ExpG

Other people would argue that not all shots are equal, and that’s a good point.

PSV has produced shots of higher quality than the shots they have conceded. The average PSV shot this season has an ExpG of 0.131 (5th in the league), while the average shot PSV conceded has an ExpG of 0.094 (3rd in the league). This reflects their present philosophy to try and contain their opponents, and take advantage of quick counter attacks.

Despite a negative shot count, their ExpG count is positive. In four matches, PSV produced 8.8 ExpG and conceded 6.7. Reality check: scoring 14 from 8.8 ExpG won’t last, as will conceding just 3 from 6.7 ExpG.

PSV’s ExpG ratio of 0.549 (8.8 / 8.8 + 6.7) is okay at best, but for a serious title challenge a ratio of 0.625 is a minimum.

 

Depay

Finally, people will argue PSV that have Memphis Depay. He scored five goals already. His finishing alone can help PSV overcome opponents even without producing more shots, or generating more ExpG. Well, it’s definitely true that the eye-test suggests that Depay is the most skilful finisher in the league. Still, scoring more goals than ExpG suggest is hardly a basis for future success. And, for what it’s worth, Depay scored 3.3 goals less than his ExpG of last year suggested. In all likelihood this won’t carry over, but so much for that supposedly superior finishing skill.

 

In the end

Still standing on board that PSV bandwagon? You may be correct, and I may be wrong. PSV may improve as the season goes on. Football is unpredictable in exact terms. But broadly speaking, PSV will either need to improve big time in their underlying play, or the wheels will quickly start to come off, and you may need to look for another bandwagon before the next international break.

Hint: it may well be red and black, as the discrepancy in Feyenoord’s outcome and process goes exactly the other way.

Turn on the scouting radar! 

We live in fortunate times. As football fans we’ve got all sorts of information about our stars at just a mouse click away. Any moment in any day can be filled with watching football, reading about football, or checking football stats.

 

Panini

How different were things we things when I fell in love with football. In the summer of ’86, when I was nearly eight years old, I studied players’ clubs, birth dates and positions, simply because that was all my Panini album had to offer. I tricked my parents into letting me watch some first halves, despite kick-off times far beyond my usual bedtime.

$T2eC16hHJGYE9nookPZnBQhRQbKBPw~~60_57

What I lacked in information, I compensated in fantasy. I created my own truth about Andoni Zubizarreta, the Spanish goalkeeper with that magnificent surname and that fascinating look in his eyes. And about Diego Armando Maradona, of whom I knew little else than Lanus, 30-10-1960, Napoli.

 

Overload

Some thirty years later, things are so very different. My constant hunger that made even the most basic stats taste good, has been traded for a stats overload that makes it hard to get a sense of what’s really going on.

In this age of information overload, value lies in cropping data to bite-size proportions, without losing its relevance. In assessing a football player, one might be able to scan through some individual stats like non penalty goals, key passes, dribbles and dispossessions. But after three players, I’m kind of full.

Comparing a league full of players just by looking at stats is an impossible task for the human mind. However, comparing different shapes is a task we are – by evolution – much better at. So, back in January, when I saw Ted Knutson’s magnificent work on player radars, the fun in individual player stats was back, immediately!

Looking at shapes, one can easily get a feel for strikers that offer great link-up play, or midfielders that offer little else besides defensive protection. The radars offer a fantastic link between player traits and cold stats.

 

Upgrade

Just like the ExpG model, the player radars on 11tegen11 have had a huge summer upgrade. I’ve decided to apply nearly the same format that Ted uses on StatsBomb. The reason behind this is quite simple: I think radars have a huge potential in opening up the stats world to a big audience. When each analyst uses their own radar versions, wider adaptation will be slowed down a lot. We shouldn’t niggle about subtle differences when it’s clearly better to just step over those details and show the world what we can do.

Like Ted described in one of his introductory pieces on StatsBomb, there are different templates for different positions.

–       AM/FWD for strikers, wide attackers in front threes, and the three men band in a 4-2-3-1.

–       CM/DM for central and defensive midfielders

–       Fullback for eehhh… fullbacks.

Goalkeepers and central defenders don’t have templates yet, since we don’t exactly know what stats to judge them by.

The outside boundaries of the chart represent 95% percentiles to prevent players like Messi and Ronaldo from dwarving the rest. The inside boundaries, likewise, represent the 5% percentiles. Negative axes, like fouls, are inverted so that bigger coloured areas are always indicative of better performances. As a reference database, to compile the axis limits, I’ve used the 2012/13 and 2013/14 seasons of the top-5 leagues (EPL, Bundesliga, La Liga, Serie A and Ligue 1).

 

Radar

For a nice example of the CM/DM template, meet midfielder Kamohelo Mokotjo, recently transferred to Twente, but leading PEC Zwolle to last season’s Cup victory.

Radar chart - @mixedknuts version - Kamohelo Mokotjo - Eredivisie 2013-14

Mokotjo tore the Eredivisie apart last season. Without reaching the 95% mark in any category, he scored high on nearly every axis, while playing nearly almost three quarters of all possible minutes. If we would scout for spectacular stats in certain categories, chances are we’d miss Mokotjo. There’s not one thing he does so good that he reaches the outer boundary that is the 95% percentile. It’s the combination of doing everything very good that makes him a fantastic player.

 

Scouting

It’s pretty obvious from this radar to see that Mokotjo had a magnificent season, but wouldn’t it be great if we could somehow quantify player radars?

Well, the good news is, we can.

I’ve computed the surface of Mokotjo’s radar and compared it to the same database that I’ve used to find the size of the axes. The surface of Mokotjo’s radar is compared to the reference database, and value is scaled on 0 to 5 stars.

To find a central midfielder with a radar surface this large is very rare, so Mokotjo is awarded the full 5 out of 5 gold stars. It’s like Football Manager’s player ranking system brought to real life. Small caveat: it’s easier to score high stats in the Eredivisie than in the EPL. One day, when we’ve learned how stats translate between leagues, we may know how to adjust for league differences.

 

Potential

Knowing how good a player’s season was is one thing, it’s another thing to know something about potential. This time, look at 17-year old new Ajax signing Richairo Zivkovic.

Radar chart - @mixedknuts version - Richairo Zivkovic - Eredivisie 2013-14

Playing for Groningen, Zivkovic had a hugely impressive debut season, for a 17 year old. His performance in front of goal was elite, his performance in terms of passing, dribbling and defensive contribution was, well, nearly absent.

In terms of radar surface, Zivkovic earned 3.5 stars, and should he add more passing to his game, or more dribbling, or some defensive work, he should rise in terms of stars. Still, for a 17-year old, this was an elite season. Now, how to put this ‘ for-a-17-year-old’ thing in our stars ranking?

Well, quite simple actually, by comparing a player to his peers, rather than to the full reference database.

The silver stars compare a player not to all other players, but only to players of the same age, or younger. There are just a handful of 17 year olds in the database, so I’ve set the lower limit to 18. When comparing Zivkovic to other forwards aged 18 years or younger, he turned in an elite performance, earning him 5 out of 5 silver stars. So, there we’ve got Football Manager’s second star rating: potential!

 

In the end

I’ve only just finished scripting these new player radars, but I still find myself playing around with them. Finally, we’ve got a tool that makes individual player statistics fun.

We’ll use them a lot this season, both on this site and on Twitter. The radars now allow for true player scouting, both in terms of actual quality, and in terms of future potential.

Expected Goals 2.0 – Some light in the black box

If football analytics was a Hollywood movie, Expected Goals would definitely be the poster boy. The influx of attention for football analytics during the recent World Cup meant a lot of attention for the concept of Expected Goals, or ExpG as its mostly referred to. With that attention came two very important questions, that I’ll try to address in this post. What is ExpG? And how do you compute it?

 

What is ExpG?

Expected Goals is assigning each goal scoring attempt a number between 0 and 1, to represent the chance that this goal scoring attempt results in an actual goal.

I use a model that I have revised completely over the summer, so this makes for a perfect time to explain the full workings of it. Expected Goals 2.0, here we go…

 

Modelling

Suppose I tell you that a football match has just finished and I ask you to estimate the number of goals for each team. You know nothing. Not the teams, not the occasion, not the shot numbers, and nothing that happened on the pitch.

You’d probably say both teams have scored around 1.4 goals, since 2.8 is a good estimate for the average number of goals per football match. Since you have absolutely no information about the match at hand, estimating this average of 1.4 goals per team should lead to the smallest difference between your estimate and the actual goals by each team.

In building a model, the difference between your estimate and the actual outcome is called the error, and you should be aiming to keep the error as small as possible.

(don’t look down at the .gif yet)

 

Shots

Now, I tell you that the match at hand had 10 shots by team A and 14 shots by team B. Would this change your estimate of 1.4 goals for each team?

Since we know that on average 1 in 9, or 11% of shots results in a goal, it would make most sense to estimate 1.1 (10 * 0.11) goals for team A and 1.54 (14 * 0.11) goals for team B.

This is your most basic expected goals model at work. In fact, it is what we’ve been doing for years, with Total Shots Rate. The total number of shots is a nice, but far from perfect, indication of the number of goals you can expect.

 

Attempts

Let’ s add some more information to our model, and for the sake of readability of this piece, I’ll give you all visual information on a single goal scoring attempt that we’ll use as an example of the current ExpG model that I use on 11tegen11.

sneijder

Here’s what the ExpG model sees.

  1. The match situation is open play

The models discriminates between seven match situations: open play, corners, direct free kicks, indirect free kicks, penalties, rebounds and first time attempts.

  1. A non- league match

This fragment, in case you hadn’t noticed originates from the Spain vs. Netherlands match at the past World Cup. For each league, different conversion rates are computed for each match situation.

  1. Game State

The score line during this attempt is 0-0, so the odds of scoring are slightly reduced. Shots at even game state are converted a bit less than shots at GS +1, or even GS -1.

  1. Shot location

The angle to the goal is 22 degrees and the distance is almost 15. Note the absence of units for distance, I don’t compute yards or meters, just an abstract number based on coordinates. In terms of modelling, it’s all about the relative difference between different goal scoring attempts, and not about getting the distance correct in absolute terms.

To compute the angle to the goal, I compute angles to both goal posts and take the difference between those two numbers. The number you get represents the view a player has on the goal. It represents how much of a 360 degree circle around the player is represented by the goal. For more lateral positions and more distance from the goal, the number goes down. I prefer this method over a simple angle to the middle of the goal, since works better for close ranges, where most shots are taken.

  1. Shot type

This is a shot, rather than a header. Given the location, this makes a huge difference in terms of ExpG.

  1. Though ball

The shot has been assisted with a through ball. This is a big plus for ExpG, since through balls generally reduce the number of defenders able to contest or block the shot.

  1. Cross

The shot has not been assisted with a cross. Crosses are bad. They have a negative influence on ExpG. It’s easy to get loads of crosses in, so in terms of trying to score goals they may be good for some teams at some times, but it’s harder to score when the goal scoring attempt comes off a cross than when the same goal scoring attempt does not come off a cross.

  1. Touches

The attacking team has taken three touches. More touches taken reduces ExpG, since (generally) defenders have more time to get in position to defend.

  1. Vertical speed

In the build-up of play, the attacking team has moved the ball forward at 2.87 per sec. Note the absence of units for distance, since this is again an abstract number based on coordinates. More important point: quicker vertical movement leads to higher ExpG.

 

Regression

None of the above items are used because I personally think they are important for ExpG measurement. They all show up as significant factors in a multivariate regression analysis that I’ve run on some 160.000 goal scoring attempts in various match situations and various leagues. Just like we tried to minimize the error in our initial two estimations in the early stages of this article, a complex regression models tries to minimize those errors for large numbers of shots and large numbers of potentially important factors for ExpG.

In the end, for open play shots, the above mentioned factors prove to be important. For different match situations, different factors are important. You can imagine that vertical speed is not important to score from corners, or that for indirect free kicks the number of touches is not important (the defense is set to defend anyway). The joy of a multivariate regression model is that it’s not up to you to decide which factors to use (and then having to defend your choice on blogs and twitter), it’s the model that advises you which factors to use and how to weigh them.

In the future, we may discover new items to measure. If the multivariate regression model then suggests them to be of significant influence on ExpG in certain match situations, they will be added for those match situations. The model is a living thing. If I can improve it, I will.

 

Defensive pressure

The most frequently heard comment on any ExpG model is probably the fact that defensive pressure is not incorporated. That’s both true, and not true, depending on how you define defensive pressure.

Since all data is based on ‘on-ball events’, we don’t have any direct information on the position of defenders and goalkeepers. In isolated cases, this can be quite frustrating. Sometimes a goalkeeper is stranded way out of position, and your model ends up underestimating the ExpG of that goal scoring attempt.

The model may not have direct information on defender and goalkeeper positioning, it does have a lot of indirect information on it. Game State, vertical speed, crosses, through ball and number of touches all carry some information about the amount of defensive pressure that is present for a goal scoring attempt. Obviously, direct information would be preferable, but even with this indirect information, for 99% of attempts we get a good sense of defensive pressure.

 

In the end

With this piece, I’ve opened up about as much as I can on the workings of my ExpG model. There is no single formula that I can give. It’s not as simple as ‘shots from this zone get 0.12, headers from that zone get 0.07′.

Each goal scoring attempt is judged on the basis of its relevant contextual information. The result is the best estimate I can create for each goal scoring attempt. Using the best contextual information can teach you so much about football, let’s have a lot of fun with it this coming season!

Why do we write?

We are busy people. Most of us are in their twenties or thirties, have demanding day jobs, and a partner or family we love to attend to. And, just for fun, a few years back we opened a blog and wrote something about football and numbers. We liked it, so we wrote some more, and kept on doing so. We coined ourselves something of an online community of football analytics bloggers, and by now we’ve been around for years.

But something is changing. Most established football analytics blogs experience a severe drop in articles over the past year or so, and 11tegen11 is no exception to that trend. We are busy with our jobs, lives and families. Football writing can wait a moment, and another moment, and another moment. To the point where I started believing my own lie that I couldn’t find the time for writing recently.

 

Busy

Life is no busier now than it was when I started writing, back in the summer of 2010, and blaming time constraints is just the easy way out of a question that deserves an honest answer. A recent piece by @JFFutbol poses the question sharply: “is football blogging dead, dying, or simply changing?

Author Johnathan Fadugba comes up with three major reasons for the decline in blogging: time constraints, ‘it isn’t going anywhere’, and ‘it isn’t fun anymore’. None of these apply to 11tegen11, since time constraints are no different now or back then, over the years we’re absolutely going somewhere, and I definitely enjoy writing blog pieces. Yet I do have the feeling that my blogging activity is painfully slow recently. So, here’s a personal story about the 11tegen11 blog, and how it has developed over the years.

 

Tactics

In the summer of 2010, 11tegen11 started out as a tactics blog with a focus on Dutch football. My aim at 11tegen11 had always been to be an independent, personal blog that provides well-constructed opinions on anything Dutch football related. The use of numbers and analytics was a logical path to take. I figured I’d use numbers and analysis to form an opinion, write about it and be different from just a random guy with an opinion. My writing focuses more on the travel (analytics), than the destination (conclusion).

The biggest problem, back in 2010, was the general lack of access to data. My writing mainly concerned tactical match and team reports. That was hardly data driven at all, but it did help me to get in touch with two data companies: Infostrada Sports and InStat Football. Both of them helped me get access to data I would never have seen otherwise, though, back in 2010, that meant raw shot and possession numbers per match. Which still felt like the bomb, by the way. My football blogging helped me to establish a platform to use this data, which I would not have had if I‘d just been the average casual fan.

 

Oh happy days

Exploring this level of data with our growing football analytics community, we dragged the concept of Total Shots Ratio (TSR) as far as we could. We’ve developed predictive models based on TSR, used it to evaluate manager performances, and successfully identified under- and overachievers at several stages of the season.

Databases were simple two dimensional spreadsheets, calculations were done within seconds, and the rest of the evening remained for writing. For most of 2011 we had a lot of fun with simple concepts like TSR, which proved a decent performance analysis tool.

 

Data

In 2012, things started changing. Websites like Squawka and WhoScored filled our desire for more and better data. Both sites bring a wealth of OPTA-fueled data at just some mouse clicks away. Shot charts, minute-by-minute data, individual player actions, you name it.

It wasn’t long before even we, TSR protagonists, had to confess the limits of simply counting each and every goal scoring attempt. It took some time to develop, but the invention of ‘Expected Goals’ (ExpG), was inevitable (as can be seen in this philosophical piece from 2011). With ever refining models, we assign each goal scoring attempt a number between 0 and 1 to reflect to odds of said chance resulting in a goal. ExpG is definitely the eye catcher of football analytics at present, but the possibilities are endless, both on team and player level.

 

Mainstream

Meanwhile, the activity of our football analytics community did not go unnoticed in mainstream media and from 2012 onwards, a significant number of early blog writers got snapped up by established media sources or data companies.

Personally, early in 2013 I was offered the opportunity to join a small group of pioneers and start writing for the website of Dutch national newspaper De Volkskrant. Recently, I could add a support writing role for digital news medium ‘De Correspondent’, which meant a step up in mainstream media land. The increased attention allowed us to show our work to a bigger crowd of Dutch readers at an established stage, but it also brought along the pressure of deadlines and expectation. All that time, blog writing could wait.

 

Complexity

With the introduction of Squawka and WhoScored in 2012, the amount of publicly available data grew exponentially, and so did the complexity of our analytics. Personally, I used some in-between-jobs time to train myself to use R statistical software to make best use of our new found wealth, and time investment sharted shifting from writing to analyzing.

The present ExpG model on 11tegen11 is a self-learning general linear regression stratified for different match situations like open play, corners, free kick, etcetera. The model uses as much contextual information as possible within the limits of on-ball data. Shot location, shot type, assist information, game state, league effects are all used if appropriate for the match situation at hand. A spare hour is easily spent trying to fine tune some aspects of the model, or to fix some complicated large size database issues. Again, blog writing could wait.

On top of that, in the back of our minds, a soft voice kept insisting: “don’t share everything you’re developing now, it might be of competitive advantage”. So far, it’s hard to earn money with football analytics, though that may change in the future. Clubs refrain from massively adopting analysts for various reasons, and the betting industry is pretty hard to catch over longer periods of time. Personally though, this phenomenon has played a role for a while, and it would be unfair to open up in this piece without mentioning this factor.

 

More distractions

Pressed in between work, social life, and new-found deadlines for mainstream work, it was often easier for me to pop out a twitter shout or a short infographic. R is a great piece of software to create scripted infographics, and potential blog pieces ended up half-written before actuality had caught up with them, or never even got further than some pilot data work.

On top of that, blog writing suffered severe competition for the one thing even better than football data. Right, watching football that is. Now that’s where 2010 and 2014 make a huge difference. Nearly every day between August and May holds top level league matches that can be found on TV or streaming on the internet. And for those dull months in between there’s play-offs, World Cups, friendlies, etcetera. Never an evening without football on your flatscreen. And, with the advent of detailed league data worldwide, the number of leagues to get indulged just keeps on growing. If you can watch the Argentine Super Clasico, blog writing can wait.

 

Quality

Back in the TSR days of 2011, writing about football analytics was easy. In counting shots there isn’t much one can do wrong. But things are different now. Complex scripts contain small errors that need tracking and fixing. The free flowing game of football needs complicated analysis to be at least somewhat accurate, and complicated analysis needs a lot of words to be explained.

People want to read about football, not about analytical modelling, and it’s a challenge to walk the tight rope between under and over explaining analytical methods. On 11tegen11 at times, I’ve avoided this issue by not writing at all, or, in most cases, by focusing on concepts (like scouting or identifying playing style) rather than teams or players. The concepts often didn’t return. Not because they weren’t interesting, but because self-imposed 1000 or 1500 word limits for team of player articles doesn’t leave room for explaining the concepts enough.

Perhaps that’s wrong, and I should have just used terms like ‘crosses to through ball ratio’ or ‘ExpG over performance’ regularly so that returning readers would familiarize themselves with it. And readers that shy away from terms like that, well, would that be your audience anyway?

 

In the end

In the four years that 11tegen11 has been around, a lot has changed. We’ve got more detailed data than we can handle, we can see more matches than would actually be healthy, and kept writing waiting for too long.

Football analytics blogging may well be at a breaking point in its short life. Investing more in deeper and more complicated – yet more accurate – analysis, without explaining to a wider audience, would see us dig a hole for ourselves. It would make our little community inaccessible in a few years time, and that would not help develop this niche that I don’t think should be a niche.

Writing can makes watching and analyzing football more fun. If we’d make up for lost ground and write without those unpretentious pieces that we did a few years ago, we’d be better off in the long run. Not all pieces need to be mouth-watering analysis in eloquently written near poetry. Bring back the raw unedited pieces that football blogging should be all about. Bring back the fun!

Shooting style at a glance

The 2014 World Cup has been an amazing experience. It will enter history as the World Cup where Brazil collapsed in front of their home crowd, where the world fell in love with a fresh and talented Colombian side, and where three-at-the-back defenses proved that they’re back from the dead. But it was also the World Cup where the world at large tasted the use of stats in football, and seemed to like it.

 

Stat love

Over the past years, the small community of stat loving football bloggers have been cooking some nice concepts that proved tasteful to some, and at least digestable to most fans. The concept of Expected Goals is the best example, and it is now more accepted than ever. Intuitively, separating poor from good quality chances makes a lot of sense, and ExpG allows us to communicate much better than simple shot counts.

This post will aim to do just that: communicate different aspects of shooting behavior. In one plot, I hope to separate quality shooters from quantity shooters, involved shooters from uninvolved shooters, and efficient from inefficient shooters. That’s quite a lot, and it runs the risk that every data visualization carries: showing too much in one picture. Still, on this one, I’m convinced, dear reader, that you can do it.

 

The plot

So, here’s the plot I was talking about… And before going into further details, I should point you to Stephen McCarthy’s inspirational work on data visuals, which has obviously formed the inspiration for this design.

Shooting style players Eredivisie 2013-14

Nice colors, right? For a full size version, click on it.

This plot combines four elements that constitute a player’s finishing. The horizontal axis is simply the number of shots per 90 minutes played, and the vertical axis is the total amount of Expected Goals per 90 minutes. Both dotted lines represent the two standard deviations mark.

Of course, for all information in this chart, penalties are excluded. Oh, and only players playing over 30% over minutes available are included to prevent the per 90’s from being screwed.

 

Rainbow

The nice rainbow of colors represents the average ExpG per shot, ranging from very poor (red), through average (green), to excellent (purple / pink). Since ExpG per shot is the same as dividing the vertical axis by the horizontal axis, the colors are nicely arranged in the chart. Poor shot quality will prevent a player from building up ExpG, so red and orange dots will fly at the bottom of the balloons, while high shot quality helps build up ExpG quickly, and leads to the pink/purple/blue dots flying on top.

The fourth parameter is the size of the dots, where bigger dots represent more goals scored. Players with bigger dots than those around them, like Alfred Finnbogason, have converted at a more efficient (and probably unsustainable rate) than others. Reversely, players with relatively small dots, like Mulenga, Havenaar and Depay, have converted inefficiently, which, by the same line of thought, is expected not to carry over to future performances.

 

Player styles

Memphis Depay has been the absolute shot monster of the 2013/14 Eredivisie, but with his limited shot quality, he remains quite a distance behind the most dangerous strikers of the league: Graziano Pellè and Jacob Mulenga. New Ajax signing Richairo Zivkovic already completes the top three of most dangerous strikers at 17 years of age, with both a high shots count and high shot quality.

Hakim Ziyech and Oussama Tannane have a shooting discipline problem. Both rank in the top six in shot frequency, but also in the bottom of the league with respect to shot quality.

 

In the end

This chart conveys a lot of information at a single glance, and provides even more for those patient enough to spend some more time on it. In the near future, you will find similar graphs on my twitter timeline, which I’m using more and more to pop out visuals, when I can’t find time for a full blog post or when I don’t want to repeat myself just with updated numbers. If you’re interested in this blog, you may want to pick up the visuals there too.

 

Once more, this post is inspired by the great visuals of Stephen McCarthy. Follow him!

Has Holland expelled its obsession with possession football?

The Dutch national team crushed reigning World Champions Spain in an even sensational as unexpected display of brilliance. With a convincing counter attacking tactic, ‘Oranje’ ran out 5-1 winners over a demolished Spain side. Is counter attacking football the new tiki-taka?

Current national manager Louis van Gaal made his breakthrough at top level management with the Ajax side of the mid nineties. With a system based on optimal ball circulation and wide winger offense, he managed to win the current Champions League. But, like good managers should, Van Gaal always takes the actual circumstances on board in his choices. At mid nineties Ajax, possession based circulation football may have been the best choice, in different circumstances, Van Gaal makes different choices.

 

Counter attacks

In this World Cup, Holland shines in quick counter attacks, breaking into space immediately upon winning possession of the ball. This form of offense allows the qualities of the best players, Robin van Persie and Arjen Robben to shine to full effect.

With three, rather than two central defenders it seems at first glance that Holland chooses a more defensive concept, but the reverse has proven to be true. The extra central defender allows both full-backs to push forward in support of the offense. Daley Blind’s two assists against Spain are an excellent example here.

Passing network Netherlands - Spain 1 - 5 Netherlands

The above diagram shows the average position where the Dutch starting XI passed the ball from. The concept of three primary defensive players (2, 3 and 4) is clearly shown, as well as the fact that when in possession, the full-backs (5 and 7) are true wide wingers.

 

Notational clichés

All too often, formational debates are reduced to an exchange of notational clichés. The 4-2-3-1, or the 4-3-3 do not exists, and all teams apply different interpretations and different tactical preferences. And more importantly, modern teams line up vastly different when in or out of possession. In possession, we see the Dutch as a 3-4-1-2, while out of possession they take a 5-3-2 shape.

If we would reduce the description of the Dutch formation to 5-3-2 as is most commonly done in the media currently, we miss out on the whole point of the full-backs being wingers and Sneijder linking up with the offensive duo, i.e. the whole point of the 3-4-1-2. If we prefer to call them 3-4-1-2, as would be fitting with their in possession style, we should call all four men defenses a two men defense, as full-backs generally push up on the wings. Over the next days I will discuss a few more of these diagrams to show that most 4-2-3-1’s are in fact 2-4-3-1’s in possession.

 

Passing network

The width of the lines represents the number of passes that players have combined for, with a threshold of six. The crucial role of left back Daley Blind (5) in circulating the ball forward is well displayed here. Creative midfielder Wesley Sneijder (10) tends to drift to the left side of the pitch, which makes him easy to find for Blind. The role of the right full-back, Daryl Janmaat (7) is not as much in passing the ball, but more in providing offensive runs. In possession his position is as offensive as the offensive trio of Sneijder (10), Van Persie (9) and Robben (11).

 

Trend

It’s still quite early in the tournament, but Van Gaal’s choice for counter attacking football seems to fit an international trend. Teams that have dominated possession have had a tough time, or even lost their games. Brazil (61% possession) had a lot of trouble creating chances against Croatia, Mexico (62%) created less chances than Cameroon, Uruguay (56%) even lost 1-3 to Costa Rica and Spain (64%) was blown away by the Dutch counters. And this all comes at the end of a season where counter attacking teams like Real Madrid and Atlético contested the Champions League final.

 

More possession, more wins?

The relationship between possession and outcome is rather complicated in football. Generally speaking, teams that win more matches have more possession, so the correlation between possession and wins is undeniably present. However, the causal relation between possession and wins is not so straightforward. In other words, does having more possession gets your team more wins?

A clear cut answer is not (yet) available, and it seems reasonable that circumstances may dictate which answer to this question is true at which particular moment. Against Spain, the Dutch team made optimal use of the space behind the Spanish defensive line with their lightning quick counter attacks. In the match against Australia this will, in all likelihood, be quite different. In the post-match interview of the Spain match, Van Gaal already hinted at a return of the 4-3-3 system. The media may portrait him as dogmatic, in tactical terms Van Gaal’s pragmatism dominates. And that is a good thing for Dutch football.

How to define attacking style!?

Football analytics at the moment is a bit like a toddler. We think we can do quite a decent job, we’ve started talking quite loud with more variety in our vocabulary, and every now and then we start to make some sense too. Oh, and hey, we make people laugh at us at surprising occasions! Yet, most of the time, in hindsight our actions don’t make the most sense. And what we could do a year from now makes our current level of performance laughable at best.

 

Analysis

Most of my earlier analytics work has been aimed at performance analysis. Which team is better? And later on, which player does better? However attractive this edge of using stats is, in an environment as highly driven by random occurrences as football, this type of analysis approaches its limits quite soon. In plain English: football is quite hard to predict.

 

Just a level below predicting, is describing. And a recent promising development on the describing front has been introduced by fellow blogger and analyst Michael Caley. It may well be the describing part where football analytics could win over more souls to support our belief that numbers can add to a better understanding of the game.

Could you to tell me in a few words how your favorite team prefers to attack? Chances are that you’d use words like ‘direct’, ‘patient’, ‘flank play’, ‘through balls’ and ‘crosses’. Now, what Michael has come up with is a simple and easy to use stat to express two key elements of attacking play: pace and style.

 

Pace

Pace is expressed as the number of completed passes per shot taken. Just use raw numbers per team, no complicated formula’s. Here’s what we come up with for the most patient teams in Europe’s top-5 leagues plus the Eredivisie.

Passes per shot - top 10 - Multiple Leagues 05 juni 2014Some of the usual suspects, like Swansea, PSG, Arsenal, Bayern and Barcelona, make this top 10, but the most patient team in Europe are Borussia Mönchengladbach with some 37 passes per shot taken. I haven’t seen them play myself this season, but perhaps some Bundesliga fans are willing to comment here.

The other end of the spectrum will reveal teams playing lightning quick football, preferring to shoot rather than pass around.

Passes per shot - bottom 10 - Multiple Leagues 05 juni 2014That’s interesting! The top four teams are all Eredivisie teams, a league known for high scoring and high shot numbers. At some distance from the rest, relegated side N.E.C. are identified as the most direct team in Europe.

Pace is a descriptive thing, not a performance marker though. Other teams from this top 10 (Levante, Augsburg, Heerenveen) have had decent to good seasons with a very direct style of play.

 

Style

The second aspect I take from Michael is style of attack. Using two contrasting key elements of constructing offensive schemes, crosses and through balls, we can compute a simple ratio that proves to spread out nicely across different teams. Also, it fits well with the style of play we’ve familiarized ourselves with for certain teams. Here’s the top 10 in terms of the ‘crosses to through balls ratio’.

Crosses per through ball - top 10 - Multiple Leagues 05 juni 2014Four French teams in the top 6, but the EPL is also nicely represented. Manchester United’s Moyesball indeed makes the top 10 for crossing heavy offensive schemes, but to my surprise Mourinho’s Chelsea is not far off!

One thing: I’ve stripped out NAC, as they simply won’t play any through balls and their ratio is so off the chart that the other teams are dwarfed by it. In time a case study to NAC and manager Gudelj should follow.

In the bottom 10 we find the teams that prefer through balls over crosses. It seems a ratio of around 3 is as low as it gets, and with around 4 you’re still very much a through ball oriented team.

Crosses per through ball - bottom 10 - Multiple Leagues 05 juni 2014

Barcelona are the masters of avoiding crosses and poking central passes into the box. But would you have guessed Newcastle are so through ball heavy? And look at Heerenveen, showing up as a very direct teams just above, and avoiding crosses at the same time!

 

Pace and Style

Things get even more interesting when we combine both of these metrics in one chart. Teams should broadly fall into one of four categories.

–          Patient and central

o   Barcelona, Mönchengladbach, Roma, PSG, Swansea, Arsenal, Bayern, Toulouse and Ajax

 

–          Patient and wide

o   Nice, Rennes, Manchester United and Bordeaux

 

–          Direct and wide

o   Bologna, Sochaux, Lazio and Saint Etienne

 

–          Direct and central

o   Heerenveen, Newcastle, Real Madrid, Sevilla and Dortmund

 

In the end

There’s no single preferred mode of attack, and patient is not necessarily better or worse than direct. Also, central doesn’t beat wide. There are multiple ways to construct good offense and the players at hand, the philosophy of the club and the level of execution of the style if perhaps much more important.

But these concepts hand us a tool to describe pace and style, to follow trends within clubs and managerial careers. All of that with a simple tool, brought to you by the bright mind of Michael Caley.

To close off this post, here is a mega chart picturing all teams from the top 5 leagues plus the Eredivisie. Do click on it for the full, downloadable version, and you’ll see that the names above are all taken from the four corners of this chart.

Directness and Team Style - Offense version - multiple leagues

Dreaming of competitive football

Imagine a world where football teams are truly competitive, where teams can’t buy their way out of trouble, and where it’s not the usual suspects competing for trophies year after year…

Dreams

This article envisions such a world. With most major competitions having come to an end, and the World Cup still a month away, this is my moment to dream about my ideal football world.

top_dreaming_of_soccer_tile_coasterI don’t expect this dream to become reality at all. In fact, I don’t think any single aspect of it is even on the brink of making it to FIFA’s regulatory committees. But don’t let that shy me away from inviting you to my dream world.

Here we go…

The goal in this dream world is to have as exciting football matches as possible. Excitement is hereby enhanced by competitiveness and transparency, so our world should distribute players as evenly as possible across teams and make clear how it does so, rather than have rich teams plucking talents from poor team, virtually at will, with finances largely obscured.

 

Salary cap

nba-salary-capFirst and foremost, in our world, football really needs a salary cap. Limit the amount of money teams can spend on player salaries to a certain fixed amount and teams will need to tinker with the balance of their first team squad. Everyone who has ever laid his hands on fantasy football management knows how challenging it can be to try and outsmart your rivals in trying to cramp as much talent as possible in a tight budget.

As a consequence, Messi and Ronaldo won’t see out their football lives surrounded by the best of the best. On the contrary, you can see superstars being picked up by teams of a lower standard, because those teams are the only ones able to fit their massive salaries in. Imagine a Messi-fueled Valladolid taking on Real Madrid minus a handful of their super stars…

As a consequence, Chelsea won’t load up on all offensive midfield talent of the entire world, only to farm them out and decide the fate of many more players than their first squad can potentially harbour. Choices will need to be made, which makes for interesting debate.

In our world, salaries are open, so that fans are free to discuss the merits of squad composition. How fun would it be to speculate how best to deal with the amount of money coming free next summer with players X and Y leaving, knowing which players could roughly be attracted for which sums of money.

 

Youth

Another aspect where our world differs from reality concerns youth talent. No longer do clubs train their own future players. In fact, in reality clubs hardly train their own future first teamers anyway, with most players dropping out, or ending up with other clubs.

In our world, players play for youth teams until the season they will turn 19, rather than moving across the planet as teenagers. These youth teams are completely independent institutions, unconnected to professional football clubs, but rather focussing completely on making the best of the potential talent in their ranks. Youth teams compete in a competition of their own so that fans will be aware of the next generation soon available for their clubs to recruit. Youth teams are financed with collective support by all professional teams of the nation.

 

Draft

office-stamp-draft-vector-5484Recruitment follows a draft, which ensures that poor teams get the best young talent on the market, to balance the teams as much as possible going into the next season. As an added benefit, this ensures that young talent will get maximum exposure and playing time, as poor teams will generally slot these talents right in, rather than wasting them in loans and on benches as is so common nowadays.

To get higher in the drafts, youngsters will need to showcase their talent in the youth league, which will trigger great debates among fans, scouts and other people trying to rank football players.

Oh, and the first two years these talents will stay with their draft team on a fixed and moderate income, before being open to move in the market and negotiate their own salary.

Imagine RKC battling for survival with Memphis Depay flying on the left wing, or Norwich injected with the virtues of Adnan Januzaj. I see nothing but advantages!

 

Creativity

Financial resources will always be different among teams, and now that this does not translate in a bigger wage budget, rich teams will need to be smarter than poor teams. Hire smart scouts, develop the best scouting techniques, hey maybe even make use of the best analytical tools out there! Creativity all around, only not in avoiding financial fair play this time…

 

In the end

Yes, I’ve been watching quite some basketball lately. Well spotted!

Most, if not all of these dreams are reality in basketball, which goes to show that (A) somewhere on this planet it’s possible to regulate stuff like this, and (B) it works in enhancing competitiveness!

If-I-had-asked-people-whatAnd yes, I know FIFA won’t implement any of this, but don’t let that stop us from thinking how we could improve our beautiful game. In the words of Henry Ford… If I would have asked people what they wanted, they would have said: “Faster horses”.

Sometimes we just have to think out of the box, and dream of our ideal football world.

This is mine, what is yours?

Radar Love – Capturing Players in a Single Picture

Comparing football clubs is one thing, comparing football players is yet another. It lies at the heart of many pub debates, where passionate fans try to convince each other that their beloved star is better, often to settle the subtle disagreement by concluding that the players are different. And indeed, different positions, skills and tactical roles make it hard to rank individual players. The first step should be to picture them correctly, and that is where this post will step in.

A lot has been written recently about football analytics and the use of numbers in the beautiful game in general. Some claim it’s an enrichment, some claim it ruins the magic of the game. I don’t see it as such a clear separation. Whether we want it or not, stats are there.

It’s up to each of us to decide for himself how much of it we prefer to add to our football match experience.  And if the analytics community sees anything as its task to lower the threshold for people to start using stats, it should be making stats more accessible. I’m fairly confident that the addition of radar plots will do just that.

 

Giants

‘Standing on the shoulders of giants’ is an apt way to put what I’m doing right here. The conception of many excellent analytical and visualization ideas lies outside football, and radar plots in sports started with basketball, where they appeared in 2009. A few weeks ago, it was @StatsBomb’s own Ted Knutson who introduced them in football. Unsurprisingly, they were quite well received for the many advantages they have.

I’ve given my own twist to the radar plots and I should perhaps mention that the design of these plots is very much a work in progress. Along the way we may decide some elements are missing and others should better be omitted from the chart. For a start, here it is. Click on it for a full-size version.

Radar chart - Dusan Tadic vs Lucas Piazon Eredivisie 2013-14Which better players to give the honor of the first radar plot on 11tegen11 than the two most creative attacking midfielders of the Eredivisie, Twente’s Dusan Tadic and Vitesse’s (or actually Chelsea’s) Lucas Piazón.

My version of the radar plot has nine axes and I’ve spent a considerable amount of time thinking about which parameters to include, as well as how to order the axes. All parameters are presented as per 90 minutes. The decision not to present any actual numbers is a conscious one, as I felt it would distract from the goal of the plot, which is to compare players. If you wish to see the underlying numbers, I’m fairly sure you’ll be able to find them within minutes. The scales of the axes represent the minimum and maximum values found in the league.

Let’s go over the axes one by one.

 

Passes

On top, on the twelve o’clock position is the amount of passes per match. Players who are more involved score higher. I have not yet corrected for total team passes, as I’m unsure whether it provides a true benefit, and what would be the best way to correct for it. Feel free to voice out, as with this concept, there’s no best design yet.

The passing axis is placed in between ‘Incomplete Passes’ and ‘Expected Goals’. The order of the axes is very important, since they determine the surface created for the player. A high score on two, or even three axes leads to a significant area within the plot, creating the image of a high quality player. In this case, it’s combining lots of passes with a low incomplete passes count and a high ExpG.

Usually, more passes contribute to more incomplete passes, and more passes are the domain of players playing further away from the opposing goal. This should provide a balance that gives our radar plot value.

 

Incomplete passes

We make a counter clockwise trip to the ‘IP’ axis, that stands for ‘Incomplete Passes’. As with all negative traits, this axis is inverted, so that a better performances, in this case less incomplete passes, gives a bigger area on the plot.

Incomplete Passes is flanked by ‘Passes’ and ‘Interceptions’. This should be the area where defensive midfielders excel. Each position in the field should have an area where they can express themselves, otherwise certain positions on the pitch will be underestimated by the plot.

 

Interceptions

Interceptions are presented as ‘per 400 opposition passes’, as I’ve found raw interceptions per 90 minutes to give too much bias towards players on poor and defensive sides. This correction allows for players on ball possession teams to have a fair shot.

It’s flanked by ‘Incomplete Passes’ and ‘Dribbled by’. The latter represents how often the player is getting dribbled by, which you’d definitely not want for a defensive player. This allows good defensive players a nice area where tidy passers, with intercepting qualities that stand their ground will shine.

 

Dribbled by

Dribbled by is another inverted axis, as it’s considered better to have less of it. It is flanked by ‘Interceptions’ and ‘Tackles’ and this lower left side is the defensive player’s domain. Expect central defenders and defensive midfielders in this zone.

 

Tackles

This is pretty self explanatory really, other than the fact that, like ‘Interceptions’ I like to express it as per 400 opposition passes. It is flanked by ‘Dribbled by’ and ‘Fouls’, since these are two stats you would not like a defending player to have. The fouls axes should provide another balancing act for players making more tackles.

 

Fouls

Another self explanatory, and inverted axis. Less fouls, bigger area. It is flanked by ‘Tackles’ and ‘Dribbles’, as I felt is makes the best switch to the offensive player’s side of the chart.

 

Dribbles

Dribbles gets its own axis so that offensive players get enough room to shine. Also, I think it’s an under appreciated domain in stat use in general, where players add a dimension of unpredictability to the team. A good dribbler provides a threat that influences the style of defense of the opposition. It is flanked by ‘Fouls’ and ‘Expected Assists’.

Particularly the link with ExpA is a valuable one, since it allows wide players to express themselves in this lower right hand part of the chart.

 

ExpA

On a team basis, this may be one of the most important axes, yet on an individual player basis it should just be one of nine. Expected Assists represents the passes leading to a goal scoring attempt, where each of those attempts is weighed according to the odds to score from it.

ExpA is flanked by ‘Dribbles’ to give attention to players that should be hard to defend against: those players with enough skill to dribble and to deliver the final ball. Also, it is flanked by ‘Expected Goals’, to allow players with multiple offensive dimensions to claim a bigger part of the chart.

 

ExpG

Expected Goals is the final axes of our circle. There’s hardly a need for explaining this terms anymore. Suffice to say it represents all goal scoring chances a player takes, which are weighted according to the odds to score from it.

ExpG is flanked by ‘Expected Assists’ and ‘Passes’. The latter connection is very powerful and opposition would never want a goal scorer to see a lot of the ball, so those goal scorers that do just that should be rewarded with a bigger piece of the chart.

 

In the end

This concludes our trip around the chart. In my view, it provides a fair balance between different elements of the game, and the ordering of the axis makes it difficult to claim a lot of ‘area’ without having serious underlying qualities. This balancing act also ensures that I will use the same chart layout for all players, so that learning to use them is as straightforward as possible.

Some of you may notice that traditional metrics like ‘Goals’ and ‘Assists’ are missing. My recent work on the ‘unrepeatability’ of scoring once the quality of the goal scoring attempt has been corrected for, leads me to believe that both ‘Goals’ and ‘Assists’ are inferior to ‘Expected Goals’ and ‘Expected Assists’. Scoring or assisting without the underlying ExpG or ExpA won’t last, so why credit a player for doing it. Or, to use a quote that is mostly linked to Jonathan Wilson, the writer who inspired me to football blogging in the first place, “goals are overrated.

I’ll leave you with some bonus charts.

Radar chart - Daley Blind vs Felipe Gutierrez Eredivisie 2013-14The two best defensive midfielders of the Eredivisie! You can see Blind gets the nod in the defensive department of tackles, dribbled by and interceptions. Gutierrez is a bit more tidy in his passing, but that’s probably related to making less passes overall. Blind is more of an assisting threat, while Gutierrez gets a tiny advantage in terms of goal scoring.

Radar chart - Jeffrey Bruma vs Joel Veltman Eredivisie 2013-14Two young Dutch center backs. Ajax’ Joël Veltman does better on nearly every single axis compared to PSV’s Jeffrey Bruma. It’s Veltman’s passing accuracy that could be improved on.

Radar chart - Memphis Depay vs Viktor Fischer Eredivisie 2013-14Another Ajax v PSV meeting, with players playing in the same left wing position, but in very different interpretations of that role. Depay is much better in assisting and scoring, whereas Fisher gets the nod in dribbling, passing tidiness and interceptions. Both players add a significant amount of ‘area inside the plot’ with their dribbling skill, which is why I put this chart up. I feel it’s important to recognize that element of the game.

Radar chart - Graziano Pelle vs Luc Castaignos Eredivisie 2013-14Two players who are more similar that I would have expected. Both Pellè and Castaignos do little else both contributing to ExpG and ExpA, where the Feyenoord striker puts in an unreal amount of goal scoring threat. He touches the border of the chart, so no player beats him in the ExpG category.

How to scout a striker?

Scouting strikers should not be that hard, right? Their prime responsibility is putting the ball in the back of the net, and goals are one of the few elements of football where traditional fans and nerdy analysts agree. A goal is a goal, counting goals cannot go wrong. Strikers who score a lot of goals are better than strikers that score less goals. Or not?

In our previous piece on scouting offensive talent, we’ve distinguished two elements that constitute a good striker.

  1. The striker has to get into good scoring positions, and accumulate good shots. This is best measured as Expected Goals (ExpG) per 90 minutes, with exclusion of penalties.
  2. The striker has to convert these chances into goals. This can be measured by comparing ExpG and actual non penalty goals.

The previous post on strikers illustrated how we can measure those two elements and judge strikers separately on both of these qualities. Today we will take it a step further and see what scouting implications come from it. We will show that sometimes it is better to buy a lower scoring striker, and which high scoring strikers to avoid. But first, I want you to meet someone.
 

Meet our striker!

He plays in a big league, for a good team, where he has taken 160 non penalty shots in the past season. On average, each shot was good for 0.152 ExpG, so over all shots together we could have expected 24.4 goals from him.

The thing is, our striker is pretty good, so instead of 24.4, he scored 43 non penalty goals for an over performance of 18.6 goals. We can stick an ugly acronym to it and say his non penalty goals above replacement (NPGAR) is 18.6.

NPGAR = Non Penalty Goals – Expected Non Penalty Goals

You’ve probably guessed by now that our striker is Lionel Messi. This season, Messi still plays for Barcelona, where he has taken 75 non penalty shots to date. On average the quality of the chances was comparable to last season, with an ExpG per shot of 0.149. Overall, we should expect 11.1 goals.

The thing is, Messi is suddenly not so excellent at finishing, and he has come up with 9 non penalty goals instead of 11. His NPGAR is now -2.14, which indicates that the average player, not even the average striker, would have scored two more goals with the type and number of shots that Messi has taken this season.

 

Analysis

A story about Messi is not analysis, it’s anecdote. And anecdotal evidence is no evidence. We could ‘prove’ that finishing does stick with a player by simply picking someone else that happened to follow an excellent finishing season with another excellent finishing season and fire that point home.

It makes more sense to repeat this work for all 479 players of the top-5 leagues who took at least 10 non penalty shots in the baseline 2012/13 season. We take separate looks at the creation of goal scoring chances (ExpG per 90) and at the conversion of chances into goals (Goals minus ExpG). Both parameters will be compared over one season and the next.

 

ExpG per 90

In the first graph we will look at the repeatability of non penalty Expected Goals per 90 minutes (ExpG NP per90). The horizontal axis shows ExpG NP per 90 for the first season, and the vertical axis shows the same for the next season.

ExpG90 correlationExcellent! It turns out that players with a high ExpG per 90 in one season, are also the players with a high ExpG per 90 in the next season. This is not too surprising, as several factors influencing ExpG per 90 will remain constant over time. Strikers will still be playing as strikers, and most players playing for top team will still be playing for top teams. More work needed here, but we’ll leave that for another post, as there is a far more interesting graph coming up.

 

NPGAR

The next graph shows the repeatability of non penalty goals above replacement (NPGAR). This represents the conversion of goal scoring chances into actual goals.

NPGAR correlationIt turns out that if you correct for the quality of goal scoring attempts, there is absolutely no connection between conversion in one season and the next. A high or low NPGAR in one season has zero relation with NPGAR in the next season.

Messi is the dot in the lower right hand corner, who had an unworldly 2012/13 season, with an NPGAR of +18.6, followed by the current season of -2.1.

 

Scouting

This is a shocking conclusion with huge implications for striker scouting. If a striker bases his goal scoring mainly on conversion, he has a good chance to fail in the next season. If a striker bases his goal scoring mainly on good underlying ExpG numbers, he has a good chance to persist his level of scoring.

Buying strikers who score their goals due to a high NPGAR is something you should always avoid.

We all know these famous examples of one season wonders, who got transferred for big money, only to disappoint at their new clubs. Usually, loads of soft factors like the higher level of competition, language issues, or playing style are used to explain the disappointing results, while the only thing going on is regression of NPGAR.

Regression does not always occur though, and you can see in the scatter plot that some players do indeed follow a season of high NPGAR with another season with high NPGAR. But just as many players do not, and just as many players with high NPGAR in the second season come off seasons with low NPGAR.

 

Finnbogason

We should use NPGAR as a red flag in striker scouting. A player like Alfred Finnbogason, currently the Eredivisie top scorer with 21 goals in 20 matches, is a nice example. We can put up several red flags.

First, 8 of his 21 goals are penalties. Second, his NPGAR is +2.68, indicating that he is nearly three non penalty goals above expectations. There is no ground at all to assume that he, or any other player, will outperform the ExpG model  next year. All in all, Finnbogason’s non penalty ExpG per 90 is 0.51, which is still a good number, but by no means near the present perception of a striker that scores 1.05 goals per 90.

For next season, 0.51 goals per 90 seems a reasonable estimate. The problem is, next season Finnbogason will not be playing at Heerenveen, as he will make the step up to a bigger league, where he won’t contribute the same number as in the Eredivisie. His true level should then be estimated somewhat lower than  0.51 goals per 90 minutes, and we will all start wondering what is going on with all these high scoring strikers who just don’t cut it outside the Eredivisie.

 

Exceptions

Inevitably, though, there will be players who seems to disprove the workings of NPGAR. We can assume that half of all players will have a positive NPGAR and half will have a negative NPGAR. A season later, one quarter of players will have two consecutive positive NPGAR seasons. One eighth will have three consecutive seasons where they outperform ExpG, and so on.

In this study among players from top-5 leagues with at least 10 shots, we find 479 players. With such a big group of players, there will inevitably be some players who consistently outperform ExpG to produce season after season of positive NPGAR. This is a misleading situation, as these players will be credited with finishing skills that are basically the product of an unrepeatable effort.

 

In the end

The message in striker scouting is quite clear. Familiarize yourself with the terms ExpG and NPGAR and these mistakes of flopping striker are generally avoidable. Stay away from strikers with high NPGAR and aim for those with high ExpG numbers, as the latter group will cut it next season, while the first group has every chance of falling back.

Probably, a negative NPGAR in a player with good underlying ExpG numbers is a sign of a bargain buy. The world will see a striker struggling to convert, and it takes some balls to buy him, but the numbers indicate that a return to scoring form is right around the corner.