Spurs'n'Stats: 10000 North London Derbies

This weekend's North London derby, we are told, is the most important game of the year. You can accept this as one of those cliched statements that are broadly true, you can snark at the cliche, or you can create half a gigabyte of spreadsheets and spend hours on data entry in order to say that the importance of this weekend's North London derby can be quantified to some X. I have chosen door number three.

What I've made, basically, is a Monte Carlo projection of the remainder of the EPL season. A Monte Carlo experiment (for more see previous wiki link) is a method for discovering the probabilities of different outcomes of an event by running a whole lot of iterations of the said event. So if you had a six-sided di, and you were very very bored, you could roll it hundreds of times to discover that, yes, there is a 16.6% chance of any particular number coming up. That's basically a physical Monte Carlo experiment. With a much more complicated event, like the English Premier League season, this method requires computers, and it is reasonably useful for producing probabilities that can't be simply extrapolated logically. So my spreadsheets simulate the remaining season 10,000 times to see what the likely outcomes are.

(I have put some further nerdy explanations of the method at the bottom of the post, and if folks are interested, I plan on writing more of these posts with more discussions of the underlying statistical problems. I assume there are folks in the community much more educated than myself in statistics and such, so I welcome feedback.)

First, you'll be happy to hear that the system thinks a Spurs win is reasonably likely. Here's a table of probabilities, including a kind of useless projected final score, encoded using my crappy html skills.

Team G W% D% L%
Tottenham 1.7 46% 23% 31%
Arsenal 1.3 31% 23% 46%

The computer thinks we're going to win! (A plurality of the time!) What is useful about the Monte Carlo projection here is that I can put some numbers to the question of how important this game is. What are the chances of Tottenham finishing top 4 depending on the outcome of the NLD?

I currently have Tottenham with a 77% chance of making the Champions League places, and Arsenal at 43%. This game, by my numbers, matters for Tottenham not because we're screwed without a win, but because a win will put us in great position for the top four. For Arsenal, this is an undeniably huge game with major implications either way.

Here's another crappy looking table, showing the implications of different outcomes for the Champions League places:

Team Tot W Draw Ars W
Tottenham 88% 74% 61%
Arsenal 30% 43% 61%

That is, Tottenham increase their odds of making the Champions League to about 88% with a win, while Arsenal would see their odds of making the CL drop below one-in-three with a loss. And so on across the table. So, this is a huge game, but if Spurs are as good as my numbers say, even a loss wouldn't be crippling to our hopes. A win, though, man, that'd be nice as the run-in gets tough for the next two months. For Arsenal, this is just a huge game full stop.

At the same time, nothing is locked in. In the first of my iterations of the season, Arsenal won the NLD 2-1, but Liverpool went on an insane run, 9-1-1 to the finish to edge us and Arsenal out of the Champions League by one point apiece. This sort of thing didn't happen often, and it's hardly likely in the real world, but unlikely events happen all the time. The question is which unlikely events are going to happen. My hope is that the numbers can be useful in roughly quantifying some of these likelihoods and unlikelihoods.

(Nerdiness to follow. Well, nerdier nerdiness.)

You might be surprised that the stats like Tottenham more than Arsenal, even though Arsenal have a notably superior goal difference. This is because my method for estimating team quality is built mostly from underlying stats including shots on target, shots in the box, and Opta-classified "big chances". One of the major insights of football sabermetrics is that shot-on-target conversion (the percentage of shots on target which end up as goals) is highly variable within a season for both players and teams. So to account for this variation, it is generally better to estimate the quality of a team's attack or defense based on the number and quality of chances rather than the outcome of those chances. Tottenham do very well on these underlying statistical metrics, better than Arsenal (or Chelsea).

A note: this does not mean I think there's no difference between players and teams in "finishing". It means, first, that I think a lot of "finishing" skill is already contained in the shots on target numbers - just putting a shot on target takes great skill, and you can ask Gylfi Sigurdsson how well G/SoT measures the quality of the shots taken over a relatively small sample. Second, I think that there is so much variation in the G/SoT numbers that we can't isolate from the numbers alone finishing skill in only a season of data, so it's better to use the underlying stats even if we do end up missing some real variation between players in finishing skill.

For the Monte Carlo projection, it works by projecting a score for each game. I create a "mean goals scored" for each team for each game based on projected team quality and projected quality of opposition. I model goals scored using the Poisson Distribution, which is a pretty good model for EPL goal scoring, using the mean goals scored for the game as the mean for the Poisson. Each game thus gets a projected score, and I take the average outcomes over 10000 iterations as my projection.

Obviously there are any number of things this model misses, and I don't mean to attribute false precision to it. Teams change, tactics change, football is a wonderfully complicated thing. But I think the numbers can be useful and fun, even if they are far from definitive.

Finally, thanks to the good folks in the soccer thread at Baseball Think Factory for helping me work through the logic and programming of the spreadsheets.

Before you write a FanPost: Lurk! Comment! Talk with your fellow members! If you've been here for more than a couple of days and you know what's up, write away!

Log In Sign Up

Log In Sign Up

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to SB Nation going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to SB Nation going forward.

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Cartilage Free Captain

You must be a member of Cartilage Free Captain to participate.

We have our own Community Guidelines at Cartilage Free Captain. You should read them.

Join Cartilage Free Captain

You must be a member of Cartilage Free Captain to participate.

We have our own Community Guidelines at Cartilage Free Captain. You should read them.




Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.