clock menu more-arrow no yes mobile

Filed under:

Shot Matrix I: Shot Location and Expected Goals

I have been talking about it for a while, now it's here. This is my first spin with the Shot Matrix database, looking at its eight distinct shot location categories. What are the odds of scoring a goal from different areas of the pitch?

Jamie McDonald

In this space, I have previously extolled the virtues of shots in the box on target. While about 27% of shots on target are goals, it matters where you are when you strike the ball. Shots on target from outside the box are converted at a rate of about 13%, compared to 35% for shots inside the box on target. SiBoT are, on average, about three times better than SoBoT. Shot location matters a lot. This is hardly a strange finding, anyone who has played or watched football knows where the dangerous areas on the pitch are, and they are decidedly not outside the eighteen-yard box.

Of course, these dangerous areas are not evenly arrayed within the eighteen-yard box, either. It's really great when you can take a chance from right at the goal mouth, it's not terribly useful to shoot from the far corners of the box. But up until now, I did not have the data to distinguish shot location more granularly. Now I do. I have a better database which has logs of all shots taken in Premier League matches from the 2009-2010 season through to the present. I'm calling it the Shot Matrix database for two reasons. First, I like naming things. Second, I have a lot of information on each shot, including its location, whether it was a header, what sort of pass assisted the shot, whether it came from a set play, and the players who took and assisted the shot. So there's a whole matrix of information for each shot.

Shot Location Chart: The Eight Zones

What I have found is that location within the box makes a big difference. Shots from close and central areas inside the box are more likely to reach the back of the net than any other kind. Shots from right at the goal mouth are the best. I know none of that is exactly news to anyone on a qualitative level, but I think that it's useful to lay out in precise numbers just how much better different kinds of shots are, based on the location of the shooter.Shot_matrix_medium

The map on the right lays out the eight different shot locations I have logged. Zone 1 is the central area of the six-yard box. Zone 2 includes the wide areas, left and right, of the six-yard box. Zone 3 is the central area between the edges of the six- and eighteen-yard boxes. Zone 4 comprises the wide areas in the eighteen-yard box, further from the endline than the six-yard box extended. Zone 5, then, includes the wide areas left and right in the eighteen yard box within the six-yard box extended. Those are my five zones inside the box. Generally speaking, it's zones 1-3, the close/central areas, where the most dangerous shots come from.

(If anyone has a better name for these "close/central shots" I'd love to hear it. I like naming things, and I am currently struggling to nail this one down.)

Then for shots outside the box I have a division into three zones. Zone 6, where the vast majority of shots outside the box are taken, is the eighteen-yard box extended out to roughly 35 yards. Zone 7 is the deep, deep area beyond that. Zone 8, finally, comprises the regions right and left of the box. These areas are kind of interesting, as I'll discuss. There's rarely a good reason to shoot from such an angle outside the box, and yet these are converted to goals at a non-terrible rate. I don't think this means Andre Villas-Boas should free Andros Townsend to shoot from the corner flag, but it's an oddity in the data. I'll get to my theory in a minute, first let's look at the data.

Shot Location Data: The Good Stuff

The data listed here comes my database of all Premier League matches 2009 to the present. I list information by zones 1-8, as described above. For each zone I list the following numbers:

  • Number: Just the total number of shots from this zone in my database
  • %On Target: The percentage of shots taken in the zone on target.
  • %Goal: The percentage of goals from shots taken in the zone.
  • G/SoT: The percentage of goals from shots on target taken in the zone.
Shots In the Box Shots Outside the Box
Zone 1 Zone 2 Zone 3 Zone 4 Zone 5 Zone 6 Zone 7 Zone 8
Number 1908 1684 15444 7638 874 22365 622 219
%On Target 57% 37% 34% 37% 42% 25% 24% 32%
%Goal 43% 17% 13% 6% 8% 3% 2% 4%
G/SoT 75% 45% 38% 18% 19% 13% 10% 11%

Those Zone 1 shots are gold. The central area of the six-yard box produces the highest expectation opportunities. Clearly, Premier League defenses are quite good at preventing attempts from these areas, and the average club only takes about one Zone 1 shot every two matches. As I'll explain in a future post, not all Zone 1 shots are created equal, so it's really a subset of this already-small subset which make up the bulk of those 43% shot conversion and 75% shot-on-target conversion numbers. This means that it's hard to rate a club based just on their shots from these goal-mouth positions, especially after only eleven matches. A couple lucky bounces one way or the other can confound the data.

The next-best set of shots come from Zones 2 and 3. While it's best to be both close and central, in zone 1, being either close in Zone 2 or central in Zone 3 is quite good too. Those central shots and central SiBoTs make up the large majority of high-expectation shots.

Shots from outside the box kind of suck.

The %On Target numbers for Zones 4 and 5 are notable. While the goal expectation for attempts from wide areas decreases significantly compared to close/central shots, the rate at which these shots are put on goal does not decrease much at all. It makes sense, on reflection. When the ball goes to wide areas, the keeper can position himself or herself in the shooting lane and effectively cover a much larger portion of the goal. When the ball comes central, the keeper can cover a smaller percentage of the space. So on goal attempts from wide, you see a huge decrease in the rate of goals to shots on target even though the SoT% doesn't change much.

Shots from outside the box kind of suck. If you really do have a shooting lane to put the ball on target, they're not necessarily the worst, but a huge percentage of shots from outside the box result in turnovers, either goal kicks or blocked shots that can produce transition opportunities. The cost of shots from outside the box comes in those 75% of instances when the keeper is not called into action at all.

Now, there are definitely some boundary issues here. Any system of zones will be somewhat off on chances from the edge of a zone, and I'm certainly overrating central SiB near the edge of the eighteen-yard box and underrating central SoB just off the edge of the box.There are likely similar issues at the boundaries of Zone 1.

My sample for Zone 8—wide areas outside the eighteen-yard box—is relatively small, as these sorts of shots are very rare. So the effect seen here, where these shots are actually a bit more productive than the average SoB or SoBoT, might just be a fluke of sampling. Still, I think it's interesting. You should never shoot from those areas. Why does it work?

It's my hypothesis, then, that the slightly-better-than-average numbers from shots on target from wide areas reflect an underlying truth that a bunch of these attempts do come from good football.

My guess is there are two issues. First, you wouldn't shoot from there unless there was actually a reason. If the keeper is massively out of position, for instance. I have two goals from 2012-2013 logged as coming from wide positions outside the box, and the first is of this type. It's Sergio Aguero's goal against Liverpool last March. He was chasing down a long ball and had outrun the defenders. Pepe Reina made the bad decision to come off his line to sweep up behind the defense and he got there too late. Aguero collected the ball out on the wing maybe ten yards from the endline. He rounded the keeper, now only about five yards from the endline, and he unleashed a magnificent strike from outside the box into the narrow goalmouth. It was quite a moment of skill, but Aguero actually found himself in a pretty high expectation situation, compared to a lot of shots from outside the box. He was shooting at an empty goal after all.

The other kind of goal you'll see from this area is the fortunate cross. The paradigm here is the second goal from 2012-2013 in my database. Matthew Jarvis, facing Wigan in April, curled in an inch-perfect cross, three men in the box for West Ham let the ball go by, and with the keeper forced to guard against a re-directing header, Jarvis' cross bounced safely into the back of net. It wasn't a shot at all, really, but once it counts for a goal, it counts as a shot. A good number of the shots on target from wide areas outside the box, I think, are these crosses that transform into shots and call the keeper into action. Some percentage are probably just stupid attempts by Adel Taarabt, but a good number are actually from reasonably good plays. It's my hypothesis, then, that the slightly-better-than-average numbers from shots on target from wide areas reflect an underlying truth that a bunch of these attempts do come from good football.

So those are the shot location types in the shot matrix database. In my next two pieces, I will introduce further aspects of the database—attempts broken down shot type and pass type, as well as my individual player data. In time, I plan to totally renoobulate my expected goals formula and projection engine based on the shot matrix data. This will take some time, as I want to maintain as much continuity with my previous projections as possible. So I will be doing a lot of testing and regression analysis, using this data, to build the best expected goals formula I can before I start making new projections.

One Final Table

These are shots on target from all locations by Premier League clubs this season. I have included a not-entirely-serious xG number as well. This is expected goals scored based on just SoT location. I will be presenting data on expected goals that will require these projections to be revised significantly, so consider this an admittedly limited first draft.

Club SiBoT1 SiBoT2 SiBoT3 SiBoT4 SiBoT5 SoBoT6 SoBoT7 SoBoT8 xG G
Manchester City 6 3 30 12 3 18 0 0 21.1 28
Liverpool 6 4 22 12 4 23 0 0 19.4 19
Chelsea 8 3 14 12 2 19 0 0 16.7 16
Tottenham Hotspur 4 3 16 10 1 38 1 0 16.6 6
Arsenal 1 3 26 11 3 20 0 0 16.2 21
Everton 4 3 16 12 2 19 0 0 14.6 14
Swansea City 2 3 18 14 1 22 1 0 14.6 14
Southampton 2 2 24 8 1 11 2 0 14.0 13
Manchester United 3 3 15 11 2 22 0 0 13.7 17
Newcastle United 4 3 11 9 3 28 1 0 13.7 15
West Bromwich Albion 3 0 20 3 0 9 0 0 11.0 11
Norwich City 3 2 9 6 2 19 1 0 10.0 8
Stoke City 2 2 11 2 0 18 3 0 9.1 9
Aston Villa 1 0 15 6 0 15 0 0 9.0 10
Fulham 1 2 12 8 0 12 0 0 8.7 10
West Ham United 2 1 12 7 2 7 0 0 8.6 8
Cardiff City 2 3 11 3 2 8 0 0 8.5 9
Hull City 2 0 9 3 0 15 0 0 7.1 7
Crystal Palace 2 1 5 8 1 13 0 1 6.9 5
Sunderland 2 0 9 4 0 12 0 0 6.8 7

Again, this is no more than tentative, it's not a definitive new xG formula. All I did was take league average numbers for SoT conversion for all shot types and multiple them by each club's SoT numbers.

And I figure you're noticing Spurs ahead of Arsenal there despite Arsenal's 15 goal advantage in actual scoring. While I do believe that Spurs are likely to significantly improve their conversion numbers over the remainder of the season, it's my hypothesis that Arsenal's xG number here is underselling the quality of the shots they take. I think, though I haven't crunched the numbers yet, that when I account for shot type and pass type, Arsenal's projected goals will increase significantly. So stay tuned for that. I know good news for Arsenal is exactly what will keep y'all pumped for the next segment.

Not a member? Join Cartilage Free Captain and start commenting | Follow @CartilageFree on Twitter | Like Cartilage Free Captain on Facebook | Subscribe to our RSS feed