Tuesday, June 12, 2012

Hockey

As some of you may or may not have known, I've been punching numbers into a spreadsheet recently to try to analyze some of this NHL madness that's been going on. Now that the LA Kings have officially won the Cup, I'd like to take a quick moment to celebrate, because my model has said they would win ever since April 28th (nearly 6 weeks ago).

Yay!

Naturally some of you should be wondering, "Is this a fluke? Is he onto something or is this just luck?" My only response to that is that I believe that my model is indeed better than chance, but I'm not quite positive if the difference is enough to be particularly significant.

Now I'm going to show you some pretty pictures to confuse you:

This was a visual representation of my day-to-day analysis results. The height of each bar (y-axis) represents that team's chance of winning for a given time (x-axis). At the beginning of the playoffs I had the Blues picked as the most likely winners (20.32% chance) with the Kings as the second-most likely (13.54%). Once it looked less likely that the Blues would beat the Coyotes, however, the model began to side more favorably towards the Kings, and never looked back.

Unfortunately the model was a little off the mark for the Devils - they were only ranked as 6th most likely to win from the Eastern division. Of course, the most likely Eastern team (Rangers) did make it pretty close, I suppose. Interesting to note, though, is that the whole way throughout this analysis teams from the West were favored over teams from the East.

So I picked the right team from the west and the wrong team from the east - does that make this right or wrong? Well, that's where math comes in. Technically I got 9/15 series predictions right, which is just better than what you'd expect if you were guessing randomly.

A better way of looking at this is with a statistic called a p-value, which is a statistical test for how likely it is that your model explains the results. In a basic example, a p-value test could be used to check if a coin is fair or not by flipping it 10 times. Say we get 9 heads - the chance of a fair coin getting 9 more more heads is 1.07%, which is quite unlikely. Depending on how sure you want to be, this is around the point where scientists would reject the hypothesis that the coin is fair. This 1.07% is the p-value for the test - the chance that a null hypothesis could explain the data. Commonly a p-value of less 5% is enough to reject a hypothesis (in this case, the hypothesis that the coin is fair).

P-values don't tell you, though, what the chance is that a model *is* correct, but in general they are a good indicator for how strong a model is. A model that is giving precisely the same output as expected (on a very large number of games) would expect a perfect score of 0.5.

Check out this graph of p-values from my model:

Using similar calculations as with the p-value, these are scores for my model over the most recent playoffs (86 games). I broke each game down into the chance for the home team to win in regular time, the chance the visitors would win in regular time, and the chance of going into overtime. The lines are the results from my model, whereas the points are what would be expected from just chance.

In general, my model's predictions throughout a game got better as the game progressed (yay!), and my p-value scores were higher than chance in terms of which team would win in regular time. On the other hand, when looking at just which team would win any given game (disregarding overtime), my model scored 0.3499, whereas chance would give only 0.2253. I am incredibly please with this result, as it suggests my model may in fact be significantly more likely to be correct than just randomly guessing.

However, my model severely under-predicted the frequency of games going into overtime (which was higher than average anyway, resulting in a relatively low score for chance too). These results are going to be used in my infinitely more massive iteration of this spreadsheet next season, which is going to track the chances of winning the playoffs throughout the regular season too! Stay tuned!!

1 comment:

Civatrix said...

I hope you have this spreadsheet of yours backed up, preferably in 2 or 3 different places. It would suck to lose all this work to a hard drive error.