The trouble with football analytics

We are lucky to be at the start of an intellectual revolution in the way that the beautiful game is presented. Just a few years ago, possession stats and may be some total shot numbers here and there were all that you, the average viewer could get. How different is that now?

The recent MCFC analytics initiative is just one of many steps taken to provide football fans with more and more information on the game that they love. Apps like Stats Zone and websites like Whoscored and the recently introduced Squawka, provide a true overload of data. Any self-respecting blogger writing on football matches will use this information and back-up statements with numbers dug up from the deep-waters of in-match events. And it will certainly be a matter of time before mainstream media will follow the same direction.

But what does all this information tell us? Does the presented data answer the questions that we think it answers? And don’t we run the risk of being ‘more informed’ instead of ‘better informed’?

Anyone with a background in dealing with data knows that the key principle for interpreting this type of information is ‘context’. And that will be the main point I will try to make in this article. When provided with so called ‘stats’, always ask yourself what the context of the information is.


So here’s a picture of a flying pig. Now that’s out of context in a piece on football analytics, but that’s not the point I was trying to make. The point, though, is once again context. Simply studying the picture at hand, without considering the circumstances, would lead to the wrong assumption, that pigs (or at least this one) could fly. Study the wider picture, i.e. take the context into account, and you’ll be wiser.



Now, within a football match, there generally exists disagreement between both sides in terms of satisfaction with the current score line. In most matches, one of the teams is, based on player quality and home advantage, favorite over the other one to win the match. This favorite team starts out dissatisfied with the 0-0 score line, looking to open the score. The other team, meanwhile focuses on preventing to concede a goal, while now and then exploring offensive options themselves. Some matches are closer than others, but in general this is a key concept of thought:  one team is more satisfied with keeping the score line as it is, compared to the other team.


The happy team and the unhappy team

Let’s call the two teams that contend a football match the unhappy team and the happy team. When Ajax plays AZ at home, like in last week’s Eredivisie opening weekend, Ajax starts out the unhappy team and AZ starts out the happy team, but depending on goals being scored, these roles can switch quickly, as can the amount of happiness.

The priority for the unhappy team is to change the score line in their favour, which leads to offensive impulses, while the priority for the happy team is to prevent the unhappy team from scoring, which leads to defensive impulses. Note that these are vastly different aims that both teams try to achieve within the same match. Obviously, the non-favorite team may also look to score and increase their level of happiness further, but their stimulus to take risks is lower as they have more happiness to lose.

Now, when Ajax opened the score in the 9th minute, a role reversal occurred. Beating AZ 1-0 would be a decent result for Ajax and losing 0-1 at Ajax would be a disappointing result for AZ. Not much news so far…

But AZ succeeded in their new found goal of changing the score line in their favour and by the 50th minute, Jozy Altidore had scored a brace to give his team the lead. Unhappy team Ajax finally made themselves slightly less unhappy by equalizing in the 83rd minute and the match finished at 2-2.



Here we go with some nice basic match stats. Let’s increase our insight, right? Ajax created a total of 16 shots, with half of them on target, while AZ created only 7 shots, with four of them on target. Possession wise, Ajax dominated AZ with 55%-45%. I’m not even going to bore you with the fact that Ajax played way more passes than Ajax, that less of their passes were directed forward, how many interceptions and clearances both teams made and all the other in-depth data that was recorded.



This match had four goals, so five different phases. At 0-0, Ajax was the unhappy team, at 1-0 AZ was, at 1-1 Ajax, at 1-2 Ajax even more so and at 2-2 Ajax, but less so. An interesting quantification of teams’ happiness is provided in Mark Taylor’s analysis of match states.

Phase A of the Ajax – AZ match, with the score still 0-0, lasted only 9 minutes. Phase B, the only Phase where AZ were the unhappy team, lasted 39 minutes, and the remaining Phases where Ajax were the unhappy team in various degrees of unhappiness, lasted the remaining 42 minutes plus added time.

So, for almost half of the match, AZ have been the unhappy team, their priority had been to alter the score line at that point, and for slightly over half of the match, Ajax had occupied that role.



Typical example of match stats use: loads of information, but what to make of it? (Note the image does not refer to the Ajax – AZ match)

Knowing all this, what can we now make of Ajax’ domination in creating goals scoring chances? What is their major share of possession worth? Ajax had the unhappy role for 51 minutes, which is over 30% more than AZ’s 39 minutes. The teams had vastly different priorities during various stages of the match.

Unless we interpret all these aggregate match stats within the right context, it’s all meaningless. It would be interesting to hear about Ajax’ and AZ’s possession during the different phases of this match. Is Ajax’ share of possession higher when they defend a lead? Does AZ cede possession when sitting on a single goal lead? Did Ajax create their numerical superiority of goal scoring chances while chasing the game, and how many chances did AZ concede while being the unhappy team?


In conclusion

What we could and should do is allow this type of match stats to be interpreted within the right context. How many goal scoring chances does a team create when it has to? How many does a team concede when their priority is keeping the score line as it is? What’s the role of possession in this?

For now, this post ends with more questions than answers, but it may be the start of interpreting in-match data according to the score line at hand, or more precisely, according to the team’s satisfaction with the score line at hand and the amount of playing time left. It makes no use studying pass completion data when the most rational approach may sometimes be to lunge the ball forward when desperately chase a late equalizer and at other times to be happy enough to play low risk sideway midfield passes when sitting on a late narrow lead.

Don’t throw all this information together in aggregate portions of match stats. That will only feed those skeptics, who claim that stats are of no use in a sport as complex as football.

4 thoughts on “The trouble with football analytics

  1. Adam

    So, so good to see somebody challenging the status quo. We’ve come so far — no? — but there is so much more to do. My work with NCAA Division I Soccer is sorely limited by a lack of data (the day there are standardized box scores or — pipedream — PbPs for every match will be the best day of my life, no exaggeration), but challenging oneself to better the quality of their work is most important thing one can do, especially if the breadth of your dataset allows it.


Leave a Reply