There are a huge number of variables that go into whether any individual shot becomes a goal. The quality of the strike is heavily affected by the quality of the previous touch, was it taken first-time, or easily corralled off the dribble? Was it taken with the head or the foot? And then the chance of scoring is affected by the player's location on the pitch, the location of the defenders and the keeper, their anticipation of the shot. And even the best footballer in the world can do no more than aim at a large round area. If there isn't a wide open goal, some of that area will be covered by the keeper, some of which will be the goal post or otherwise outside the goal. Even if there is real player shooting ability, it's unsurprising that it isn't showing up in the stats. There are too many variables.
And further, there just aren't that many shots taken. In the 2012-2013 season, only six players took over 100 shots (Luis Suarez, Gareth Bale, Demba Ba, Robin van Persie, Adel Taarabt and Jermain Defoe). By contrast, in the 2012-2013 NBA season, 41 players took at least 1000 shots, and nearly 200 took at least 500 shots. You can identify skill when you have a sample size.
Expected Goals by Shot Location Crosses and HeadersThe Incredible Through-BallTeam Trends in Shot Quality
The state of research in the field is that goals per shot taken (or goals per shot on target) in year one does not predict goals per shot taken or goals per shot on target in year two. I think that's true. The sample is too small and the confounding factors are too great for us to identify player shooting skill in only one season, particularly when we are looking at players with just 20 or 30 or even 60 shots total. (For example, see this study by Alex Olshansky at Tempo Free Soccer and this one by Benjamin Puglsey at Bitter and Blue)
I want to be clear here that I don't think there's anything wrong with those earlier studies. They're exactly right that individual player shooting percentage over a regular season sample isn't predictive of future shooting percentage. That's a finding of real value. But I think I can identify shooting skill, at least weakly, when I increase the sample size enough.
1) Shooting Skill Is Selected For
In Tottenham Hotspur's match against Newcastle, the best chance in the game by far came in the 53rd minute. All Tim Krul could do with Gylfi Sigurdsson's bending free kick was knock it down in front of the goal, where Younes Kaboul steamed onto it. From right in front of the goal mouth, he bundled it into the keeper, then off a defender, and the chance was gone. I couldn't believe Kaboul hadn't scored, but I was confident that if that ball had fallen to an attacker, it would have been the equalizer. Everyone knows that defenders don't have the same kind of skills as attackers.
I can show this in the data. I have in my database 4039 goals (not from penalties or own goals) and 46049 shots (not including penalties). What I've done is separate out those players who took a lot of shots in a season from those who didn't. I've separated players who took at least one shot per game from players who took less than one shot per game. I'm using for "expected goals" the numbers outlined in the Shot Matrix pieces above (I-III), giving each shot taken an expected goals rating based on its location, the type of shot, and the type of pass which assisted it.
Players who take a lot of shots are about 20% better at converting those shots than players who shoot infrequently. Given the sample size involved, the chance of this occurring randomly is vanishingly small, approximately .002%.
To some degree, this is a totally obvious finding. If players aren't good at shooting, you shouldn't give them a job where they take lots of shots. Everyone knows that, from the crappiest Sunday league all the way up to the top of the pyramid.
At the same time, this demonstrates that Premier League players really do differ in their abilities to score the shots they take, adjusted for the quality of the opportunity. Clubs select the best shooting players to take forward positions, and produce tactics that maximize the number of goal attempts by their best shooters. So these skills are selected for, as we can see by the aggregated data. Clubs recognize such skills and generally adjust for them correctly.
2) Individual Player Skill Identifiable in Large Samples
If teams are selecting their highest-volume shooters based on those players' underlying ability to score, then it's going to be harder to make distinctions between individual shooters. For those players who shoot rarely, the sample size issues are going to swamp the signal. For those players who shoot more often, we can only compare them to other players who shoot equally often. And almost all of those players will be among the best shooters in the league, and so it will be difficult to make distinctions because the range of skill will be smaller.
I found basically no signal looking at year-to-year correlations of shot conversion, just as Olshansky and Pugsley found. Limiting to players who took 30 shots in two consecutive seasons, or even sixty shots in two consecutive seasons, produced weak correlations at best. The effect appears in samples of 20000 shots, not under 100. And individual players never get a chance to take even 1000 shots. (The highest total shots in my database is Wayne Rooney's 539 followed by Robin van Persie's 510. And that's over four and a half seasons.)
But when i combined multiple seasons and limited the sample to players with at least 100 shots in both sides of the sample, I start to find a real effect. These are all players with at least 100 shots in the EPL in the 09-10 and 10-11 seasons combined, and with at least 100 shots in the 11-12, 12-13, and current 13-14 seasons combined. There are just 21 of them,but there is a real correlation between the two samples.
I have here first a scatter plot of the data, posted to the left. The numbers here are a stat I'm calling Conversion+. In the table above I listed both expected goals based on shot total and shot quality, and actual goals scored. So Conv+ is how much better or worse than average a player is at converting their shots. (2449/2261) * 100 = 108. They are eight percentage better than average, which I express on a scale with 100 as average. So for each player, I took his Conv+ in 09-10 and 10-11, and compared that to his Conv+ in 11-12, 12-13 and 13-14.
You can see the clustering most obviously in the top right quadrant. These are the players who were above average at finishing in both samples. It makes sense there would be more of them, generally the guys who keep and retain high-volume shooting roles are going to be the guys who are good at it. You see a smaller number of dots in the top left and bottom right quadrants, which means that few players were really excellent in the first set and then terrible in the next, or vice versa. But there are a couple here (Fernando Torres is the most notable, I think.) And then there are a couple players in the bottom left quadrant. These are the players who have retained high-volume roles without actually demonstrating good ability to convert those shots.
Since there are only 21 data points, it makes sense to also look individually. So I've got a paired bar graph as well of the 21 players below:
The players in the upper right quadrant are mostly the guys you expect. RVP and Rooney, Gareth Bale and Frank Lampard, Dimitar Berbatov and Steven Fletcher. Probably the guy who surprised me on here the most is Peter Odemwingie, he's not generally seen as a terribly skillful finisher, but his numbers are good. Jermain Defoe rates above average but not elite. And by God, someone needs to tell Chris Brunt to quit shooting. (Sam Allardyce has apparently already had that conversation with Stewart Downing, whose shooting rate is way, way down this year. He was responsible for a big chunk of Liverpool's struggles at shot conversion, and now he's been pulled back to a better, lower-volume role. Based on Allardyce's discussion of football stats in recent days, I'm betting he knows exactly what he's doing with Downing's role.)
3) This Doesn't Mean Conv+ Is Usually Useful
I could only identify an effect with samples of minimum 100 shots, and even then the effect is not overwhelming. (See Nerdery section below for more.) If a player has taken 20 or 30 shots but converted either a lot more or a lot fewer than you'd expect, you're still best referring to the studies showing no y-to-y correlation in shot conversion. Probably it's been a fluke. There's a possibility that it isn't, but the only good way to identify that statistically is with several seasons of data. So we need to be very careful about concluding that a player really has a significant shooting skill.
What I wanted to show here was that the obvious is true, that some players are better at shooting than others. But at the same time I want to affirm the counter-intuitive finding of statistical analysis, that shot conversion involves so many external factors that identifying player shooting skill from a normal sample is nearly impossible. If you have more than a normal sample, though, trends do emerge. Here's how I'd look at it.
If you have an observation that Player X or Player Y looks like a skilled or unskilled marksman, while you shouldn't try to demonstrate that statistically with a 25-shot sample, showing he's missed or finished more shots. That's not really useful data. But he might be one of the more or less skilled finishers in the league even though the stats can't confirm that. You need to use your observation when you don't have the samples to confirm statistically.
And at the team level, you run into a whole 'nother set of problems. You're always going to have a mix of different guys taking shots, and teams don't have shooting skill, players do. So the mix is going to leave all clubs pretty close to the mean. As I have shown, shot conversion doesn't appear to be particularly persistent either year-to-year or within seasons. The statistic, G/SoT or G/S, is not useful at the team level in these samples. It may be the case that one club or another has a better collection of players, but once again the numbers are unlikely to be able to tell us that.
I'm using three basic methods for statistical tests, R-Squared Correlation (RSq), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). For a longer explanation of these methods, see my piece from two weeks ago.) The short version is that you want RSq as close to 1.0 as possible, but above 0.1 or 0.2 is usually at least something. A lower RMSE or MAE is better than a higher one (it means less error).
I found a real but not overwhelming R-Squared of 0.33 for the high-volume shooters in my sample. I found that previous shot conversion rate is a better predictor of future shot conversion rate than league average conversion. I get a RMSE of 4.08 and an MAE of 3.34 when using previous shot conversion rate to predict future shot conversion. I got a higher RMSE of 5.37 and an MAE of 4.49 when using league average conversion rates to predict goal scoring for these players.
An R-Squared of 0.33 means that a huge percentage of the effect seen is not reducible to individual player shooting skill, but there is probably real signal under the noise. I think this is good evidence of persistent player shooting skill, but for projections it would need to be massively regressed to the mean.
Coming next: A data dump of player shot conversion rates, with some special Spurs notes.
- Tuesday Morning Hoddle Of Coffee: Tottenham Hotpsur News And Links December 3, 2013
- Glenn Hoddle considered as Andre Villas-Boas replacement?
- Premier League Projections and Power Rankings, Week 13: Fourth Place Race
- Know Your Opponent: Fulham F.C.
- Monday Morning Hoddle Of Coffee: Tottenham Hotspur News And Links December 2, 2013