Before the North London derby, I wrote up a short post with my calculations of the projected importance of the game. I built these calculations off a larger project, and I'd like to share the whole shebang here. I have stat-based power rankings and a season projection.
Basically, I create an estimate of team quality based on expected goals scored and expected goals conceded, using the Opta data from the Fantasy Football Scout website, and then I run 10,000 simulations of the remainder of the Premier League season based on these estimates. The team quality estimates underlie the power rankings, and the simulations create the season projections.
I have included as an appendix at the bottom of the post a discussion of my method. I'd appreciate your feedback on any of it, or you can skip it and just look at the tables.
I am entirely aware that no one has reduced football to numbers or come anywhere close to doing so. These are relatively rough estimates in all cases, which any informed fan could improve by the use of his or her observation of the game. But it's hard for even a highly informed fan to add together those opinions for each game and it's impossible for any of us to use those to simulate the season 10,000 times, so I think the computers and the data have a place, too. Plus who doesn't love power rankings?
First, I have the power rankings. I list three columns, attack, defense, and overall. The attack rating is listed as team quality relative to league average - more is better, since it's about scoring goals. The defense rating is likewise relative to league average, but now less is better, since it's about not conceding goals. The overall is just the two added up.
|West Ham United||0.82||0.97||-.15|
|West Bromwich Albion||0.88||1.05||-.17|
|Queens Park Rangers||0.80||1.15||-.36|
The system loves Liverpool and City. Liverpool's goal difference is 5th best in the league, and their shots on target and big chance numbers (from which I calculate expected goals) correspond to their goals scored and conceded. They've also played a relatively tough schedule so far. Manchester City have excellent goal difference and weirdly bad conversion numbers on both attack and defense. I expect they'll be very tough on the run-in. The problem for Liverpool this year has been a tendency to run up the score against weaker opposition and come oh-so-close without a result against stronger opposition. The system expects Liverpool to play up to its averages. This could be wrong.
The system really hates Reading. They have one of the best goals per shots on target ratios in the league, and my system regresses G/SoT heavily toward the mean. Other top clubs by G/SoT include Manchester United (ok) and Arsenal. Olivier Giroud's Arsenal and Reading Reading are two of the top clubs in G/SoT - that should hopefully help demonstrate why I prefer to regress G/SoT toward the mean.
Next, I have the projected table. Here there are many more columns. I list expected records and expected points. (These will sometimes diverge slightly due to rounding issues.) Then I list % chance of winning the title, % chance of finishing top four, and % chance of relegation. With each of these, I also have a delta, the change from last week. So this tells you how important this last week's games were to each club's chances at good or bad outcomes for the season.
|Team||W||D||L||Pts||%Title||ΔTitle||%Top 4||ΔTop 4||%Rel||ΔRel|
|West Bromwich Albion||15||7||16||53||0%||0||.1%||-.5||0%||0|
|West Ham United||12||9||17||45||0%||0||0%||0||1%||-5|
|Queens Park Rangers||6||14||18||32||0%||0||0%||0||84%||-10|
- Yes, in the average projected season, Aston Villa was relegated on goal difference.
- Manchester City actually did miss out on the top four six times, while United never missed out once.
- The North London derby loss, combined with Chelsea's victory, put Arsenal in a very tough position. At the same time, 25% is just two winning coin flips. That happens all the time.
- QPR's win helped, but they're still probably screwed.
- West Ham's win over Stoke has been underrated as a result, as it put them just about entirely clear from relegation.
- You may believe that these numbers, which are damn fine looking for Spurs, constitute a jinx. You are probably correct, and I apologize in advance.
The next couple paragraphs will discuss my method for estimating team quality. See the linked fanpost above for a discussion of the Monte Carlo simulation method.
My team quality estimates are based on one of the key insights of football sabermetrics. The ratio of shots on target to goals seems to be extremely variable, while the number of shots on target per team game is much more stable. Because of this, shots on target project future goals scored better than actual goals scored predict future goals scored. A team that is putting a lot of shots on target can be expected to start scoring goals at a reasonably high rate in the future (think, say, Tottenham and Liverpool two months ago), and a team that is putting few shots on target can be expected to start notching blanks sooner rather than later (Reading is currently the poster boy for this problem). So, the first thing I do is estimate team expected goals scored and goals allowed based on shots on target.
I want to be clear that I'm not arguing that there is no such thing as finishing skill. Obviously there is. However, first, a big part of finishing skill is the ability to put a shot on target. A big chunk of finishing skill is baked into the shots on target statistic. Second, even if there is real variation in finishing skill that isn't found in shots on target, over a small sample (under a couple full seasons of data), this real skill will be swamped out by random variation. So for projecting team quality based on 2/3 of a season of data, it's better to ignore the ratio of goals to shots on target, even if there is some real signal buried under the noise.
Next, though, I do make some adjustments to the expected rate of G/SoT. Not all shots are created equal, and not all shots on target are created equal. To adjust for quality of chances, I use two more stats. First, I look at "big chances", defined by Opta as a subjectively classified situation where a player "should" score, either one-on-one with the keeper or otherwise in excellent scoring position. In the Arsenal game, the Bale goal, the Lennon goal, and the Siggy one-pass-too-many were each classified as big chances, while Arsenal had no big chances. Big chances are converted at a rate of roughly 40%, and I adjust team expected goals scored based on how many big chances they managed. Teams with more big chances score more of their shots on target, and vice versa. The big chances adjustment significantly improves the predictive utility of my expected goals number. I also make a small adjustment for the ratio of shots taken in the box - shots are close range are generally better shots. This doesn't seem to make that big a difference in predicting future goal scored - the utility of shooting from close range is already mostly contained in the SoT numbers, since shots from close range are much more likely to hit the target. But I make a small adjustment here, too, since it seems logical. That gives me an expected rate of goals scored per shots on target for each team.
For each club, I then can calculate their expected goals scored and expected goals conceded based on shots on target, big chances, and shots in the box. I call these xG and xGA.
Finally, to create a team rating, I consider strength of schedule. For each team, I have an average xG and xGA per game number. So for each team game, I can compare one club's xG in the actual game to the average xGA for their opponent. I adjust for home/road, and I have a game rating, the percent better or worse than average that one club's attack and defense were, adjusted for the quality of opponent and the site of the game. I add up all of those single-game ratings into a full season rating for each team. Those become the power rankings numbers, which are the basis for the Monte Carlo simulation.