clock menu more-arrow no yes mobile

Filed under:

Introducing the Minute-by-Minute Database: Shots and Goals by Game State

No, not a spreadsheet of obscure verbs like "inveigh," "opine," and "burble" to be used by Guardian writers re-publishing emails. It's all the game events (well, a lot of the game events) of the 2012-2013 Premier League season coded by time played and game score. Let's play with it.

Richard Heathcote

If I could change one thing about the state of statistical analysis of football, it would be to institute a minimum salary for all freelancing statistical analysts. If I could change two things, I'd also like a pony. But besides that, what I want is data. There are so many different questions we could study, if only we had basic information to work with. Interestingly, a lot of statistical information is published variously around the internet, at places like WhoScored and FourFourTwo, but rarely is it available in a format that allows for actual study. So I decided to dust off my web programming and text parsing skills--well, what remained of them--and build a database myself.

I now have a database of all the game events recorded in the WhoScored game logs for the 2012-2013 Premier League season. First, I'm very happy to share this database. It's about 15 MB of spreadsheets. Just email me through the site and I can send you the files. I hope this can be useful in general for the statistical study of football.

There are a number of different questions we can try to answer with this data. In future weeks, I'm planning to build a database of team performance broken down by the players on the field, which would be the first step toward a real +/- statistic for players. This week, I want to look at a shot and shot-on-target conversion by game state.

In my power rankings and game projection work, I regressed shot-on-target conversion for each club significantly toward the league average. One question that popped up several times, from a number of folks who definitely knew what they were talking about, was the effect of game state on shot / shot on target conversion. They argued, quite plausibly, that tactics change with the change in the score. When you go up a goal or two, you focus on keeping your defensive shape rather than pushing attackers forward. Your opponents will likely take more risks to get back into the game. The hypothesis is, they will take more shots, but these will be worse shots, while you will take fewer, but better shots. They'll have midfielders firing indiscriminately from long range, while you'll get your striker one-on-one with the keeper. These two chances might be treated equally as "shots on target" but they have very different expected conversion percentages.

(Over at Bitter and Blue, shuddertothink has been doing some very interesting work with his own game state data, which you should definitely go and read. My numbers are a little bit different from his, and my conclusions are as well. I don't want to get too far into the weeds, but I hope we can have a dialogue on this over time.)

The Data: Shots and SoT by Game State

I found that the effects of game state on shot-on-target conversion are relatively minor. They only appear in any significant way when a club gets up by two goals or more. Now, clubs do attempt more shots when losing or tied, and they do convert slightly more shots when winning, but this effect is almost entirely captured by shots on target. I think this will be easier to explain with a table.

This table lists, for all clubs over the 2012-2013 EPL season, minutes played at different game states, shots and shots on target per minute at those game states, and conversion rate of shots and shots on target at those game states.

Game State Min S/Min SoT/Min G/S G/SoT
Down 2+ 5349 .153 .053 .101 .293
Down 1 13141 .155 .051 .087 .265
Tied 35162 .143 .046 .086 .268
Up 1 13141 .138 .049 .090 .254
Up 2+ 5349 .147 .052 .119 .335

As you can see, the differences in shot conversion between the -1, tied, and +1 game states are relatively minor. Teams do score a higher percentage of their shots when winning by a goal, but this is already accounted for in their shots on target. Teams shoot less when winning by a goal, but they put slightly more shots on target. They actually convert shots on target at a slightly lower rate when winning by a goal.

So, there is a real change in the game when one team or the other gets that tie-breaking goal. There are more shots attempted by the team losing by a goal, there are fewer shots attempted by the team with the lead. That's as expected. However, we don't need the game state numbers to account for this effect. Shots on target does it for us. Clubs down by a goal, taking more speculative shots, miss the target a lot more. When you separate out shots on target, you see a tiny difference that could easily be nothing more than random variation (.051 SoT/Min when losing by a goal, 049 SoT/Min when winning by one).

Interestingly, teams actually convert their shots on target at a slightly higher rate when losing by a goal than when winning. I wonder if the effect we're seeing is an "effort" effect rather than a tactics effect. Is it possible that football clubs just try harder, and push forward more successfully, when they really need a goal? We're still looking at quite small variations, so if there is a "trying harder" effect, it's not a big one. But it's kind of cool if it's real.

The other notable finding here is that clubs convert shots at much higher rates both when winning and when losing by two goals or more. In order to explain what I think is going on here, I want to take a look at these numbers broken down by home and away. The differences are striking, and with samples of >2000 minutes, I think it's safe to say there's a real effect underlying.

Game State Min S/Min SoT/Min G/S G/SoT
Home, Down 2+ 2090 .191 .067 .113 .319
Away, Down 2+ 3259 .129 .044 .090 .268
Home, Up 2+ 3259 .165 .058 .106 .303
Away, Up 2+ 2090 .118 .043 .146 .400

First, look at that "away, up 2+" line. That is what a tactical effect looks like. Shots taken decrease by a quarter, but conversion rates go way, way up. When clubs have a big lead on the road, they really do bunker down and look for counterattacking opportunities, which they really do convert at very high rates.

However, check out that home, down 2+ number. When clubs are taking a beating at home, not only do they stream forward and take a lot more shots, they also create better shots and convert them at a high rate. This suggests that streaming players forward is kind of a good strategy for getting goals. If you have nine or ten guys rushing the box, you're going to create good chances. It's just that you're also going to allow a whole bunch of goals.

But then you've got the H+2 / A-2 numbers. Those are just blowouts, and the team leading keeps pressing forward and scoring. The losing team loses. Often they're Reading, who are terrible.

Some Tentative Conclusions

It does not appear that a team which spends more time in a tie game and less with a one goal lead should be expected to convert shots on target at a notably lower rate. The theory that Tottenham Hotspur's low G/SoT rate this season was a function of game state does not, at least initially, appear to hold up. In my next post, I'm going to look at these numbers broken down team by team, to see if there are any notable patterns at the team level.

The hypothesized tactical effect appears only once one team opens up a lead of two goals or more. And even within that sample, it's only the subset of away teams leading by a goal who really see the expected decrease in shooting and increase in shot conversion. As I work this summer on renoobulating my power ranking and projection spreadsheets, I want to consider how to include also game state data. For now, it appears that I should at least account for the number of minutes spent winning by a lot on the road.

I am definitely ending this post on a kind of "eh" note. I'm still trying to figure out what I can do with this data, and I'm interested in your feedback. What do you make of this data? And what other questions should I put to the minute-by-minute database?