Extreme Enginerding: NHL Streaks

Two statisticians get on an airplane to go to a conference. Once they've sat down on the plane, one notices that the other is carrying a bomb with him. When asked about it, the second statistician says, "Well, if the odds of having one bomb on a plane are tiny, then the odds of having two must surely be zero and we're guaranteed to be safe!"

This sort of thinking highlights a very popular gambler's fallacy. Often people will think that if a fair coin has been flipped heads four times in a row, it's more likely to be tails on the next flip because there should be an even number of heads and tails. Though it is true that, after a very large number of flips, we should expect approximately 50% heads and 50% tails, the problem with the fallacy is that each coin toss is completely independent of any other coin toss. No matter how many heads we've flipped, the coin isn't keeping track and trying to balance itself out - we just know that after a while they ought to end up about even.

Statistically independent events are important in probabilities. Coin flips, roulette wheels, dice tosses, and explosive suitcases are typically independent each other, and it's important to know this if you're ever going to go to a casino. (Related: most betting strategies people will try to sell you count on you not knowing this - don't buy them!!). On the other hand, some casino games actually have dependent probabilities. The best example is blackjack - if you know which cards have been played from a deck, then you should know roughly what distribution of cards are left to be played in the deck as it isn't being reset after each hand (which is how card-counting works).

I recently read an article by The Wanderer magazine that examined, in part, a paper by the National Academy of Sciences that investigated the effects of randomness on performance. In the paper, a large-scale model was developed where past performance had an impact on future performance. This got me wondering whether or not this was something I should incorporate into my future NHL models.

Before I get into my results, ask yourself: do you think individual games in the NHL are dependent on previous results? Is a team on a three-game winning streak more likely to win the next game than they would be if they'd just come off a three-game losing streak? Or does it matter?

It's pretty easy to rationalize it either way - perhaps having lost a couple games in a row a team could be feeling depressed and be more likely to lose, or maybe they are more inspired/desperate and would be more likely to win. Similarly, having won a couple games in a row could perhaps make a team more confident and give them an edge, or too cocky and make them lose.

If there is an effect, it would definitely be worth incorporating into a model of the regular season, so I was interested in taking a look at the results of last season. I plotted 2,460 results from the last season (fun), and decided to compare these results to what would be expected if there wasn't any impact from previous games. In every season of 82 games, there are 81 sequences of two consecutive games, 80 sequences of three consecutive games, 79 sequences of four games, etc. On average, we should expect ~50% of two game series to have the same results (win-win or loss-loss), 25% of three games series be a streak, etc. I looked at each sequence of up to 10 games in a row for each team, compared it to what I'd expect without any dependence between games, and this is what I got:

I have to admit I was pretty surprised. The results (red squares), averaged for all teams almost perfectly matched the results that we'd expect if the games were independent. Crazy!

The pink lines on the graph were the maximums and minimums for each streak achieved per team, and the blue lines are the range that we'd expect 90% of teams to fall inside naturally if games truly were independent. Again, most teams fall within this range (oddly enough, typically 27/30 which is our magical 90%).

What does this mean? Basically, for the vast majority of teams, and for the NHL as a whole averaged across all 30 teams, hot streaks or cold streaks from any team throughout the season happen almost precisely as often as we'd expect. There's no evidence here that previous games impact future performance in any significant way.

This is actually pretty good news for my model, because now it doesn't have to be quite as complicated as I was worried about. It's also humbling to know that, even though hockey games depend on the collective actions of a bunch of humans, they follows expected statistical patterns so closely.

So the next time you're betting on hockey games or worried about bombs on airplanes, just take a moment to consider that these events are statistically independent. A team on a hot streak is no more or less likely to win than any other team, and there's really nothing you can do about other people bringing bombs on your flight.

Extreme Enginerding

Labels

Friday, July 6, 2012

NHL Streaks

1 comment:

Blog Archive

Who am I?