I will again be running weekly power rankings, season projections, and game projections in this space during the whole 2013-2014 English Premier League season. But before I can unveil the initial numbers, I'm going to need your help. You see, I have team ratings for attack and defense based on the underlying stats from previous season. I have some vaguely useful adjusted numbers for the promoted clubs from the championship. But there were lots of transfers this summer. And as I've said repeatedly, I don't think the world is anywhere close to having a good player value stat from which you could build team projections, from the micro- to the macro-level.
Me and You vs. Bloomberg Sports Analytics
So in order to account for roster changes, I'm going to be including in my projections a subjective component, crowd-sourced from all of y'all. In the comments -- along with, you know, constructive discussion -- please list your projected EPL tables from first to twentieth. I will use the average commentariat table as a subjective element in my season projections.
I want these numbers because I have decided that in my head we're in competition with Bloomberg. I took a look at their projected EPL standings last week, and I came away uncertain how advanced their under-the-hood methods might be. BSA will apparently be continuing to project the EPL season throughout the year, using a simulator method that appears to be quite similar to mine. According to their Projected Tables FAQ, they will be simulating each match of the season "over 10,000 times." For reasons of synergy and proactivity, they are keeping their methods otherwise hidden from public view.
I am not.
My hope for this season is that we can stage a little competition and see how my numbers, open-sourced and informed by the subjective opinions of the commentariat, do in competition with Bloomberg. The sample of a single season is too small for any result to actually be meaningful, but I think a little bit of competition is fun anyway. Especially when your opponent is a massive corporation that has no idea you exist. Right?
I basically do two things to project the EPL season. First, I build team ratings for attack and defense. The primary input to these ratings are my "expected goals" calculations. As I discussed earlier this summer, I have re-worked my expected goals formula based on Shots on Target in the Box, and those, along with big chances and (for defense) opponent conversion rate will be the primary inputs to my expected goals ratings. I will make a small (roughly 15%) adjustment of SiBoT conversion based on big chances, and a somewhat larger (roughly 25%) adjustment of opponent SiBoT conversion based on big chances and real conversion rate.
Now, I can't just start predicting the season resutls off one game once I get the data. And I need to have some sort of preseason projection. So what I will do is begin the season using mostly my preseason projections, and over the first twenty weeks of the EPL season I will slowly phase them out until my projections are based almost entirely on the in-season data.
My preseason projections have three basic inputs. First is team strength based on expected goals from 2012-2013. Second is regression to the mean. Third is your subjective ratings. These will be weighed roughly 60/10/30. For the promoted teams, I looked at past data to try to see what projects the quality of a promoted team from the Championship to the Premier League. I found that there is basically no relationship between points from the Championship and points from the EPL. There is a small (roughly .25) correlation between goal difference in the Championship and points in the EPL. So for the promoted clubs, I'm using a number based mostly on the average strength of all promoted Championship clubs, with a small adjustment for Championship goal difference.
So what's changed from last year? I'm using a new and improved expected goals formula, using statistics I previously did not have. I'm using preseason projections as part of the projections for the first half of the season. I have also entirely re-done my projection engine.
One Million Seasons
Bloomberg Sports Proactive Synergies says they will be simulating the Premier League season 10,000 times. Ten thousand seasons isn't cool. You know what's cool? A million seasons. With my new projection algorithm, I can simulate 1,000,000 seasons in about 30 minutes. So I will be doing that. It's true that for the top-line numbers, 10,000 seasons should be enough to get you a projection-to-projection variance of only a point or so. But I like getting that variance down into the low decimals. And more importantly, this large expansion of projected seasons will allow me to do more granular pre-game projections. Late in the season, I can consider the effects of different possible outcomes in multiple games and their effects on different clubs' chances of winning the league, finishing top four, or escaping relegation. With 10,000 simulations, the samples get too small to do this sort of work. With a million, no problem.
The other chance is to my game simulation engine. Bloomberg doesn't specify what they're using, but I think it's fair to guess that they're using some sort of Poisson-based simulator. You need to simulate goals scored and goals against in every game in order to have goal difference numbers at the end of the year to break ties. It is reasonably well established in the academic literature that a random sampling of the Poisson distribution simulates goals scored in football matches to a reasonable level of confidence.
However, there is one small problem with random sampling from Poisson. It underestimates draws. Football managers and players, apparently, have a small tendency to play for a draw, and this must be accounted for in simulating matches. To solve this, I'm using a sampling from a bi-variate Poisson distribution, as suggested by Dimitris Karlis and Ioannis Nitzoufras. (Karlis and Nitzoufras, "Analysis of sports data by using bivariate Poisson models," The Statistician 52 (2003), 381-393.) With a bi-variate Poisson distribution based on an in-game goals scored correlation of about .15, I can simulate team outcomes and not underestimate draws.
So what's new here is a 100x increase in the number of simulations and a new game simulation formula that does not underestimate draws.
With your help, I hope to unveil the initial team ratings and expected table either tomorrow or Thursday.