Extreme Enginerding: December 2012

Thursday, December 20, 2012

Fall Weather

Hey there!

I realize that technically tomorrow is the end of fall, but seeing as some crazy people think the world's going to end then, I figured I'd get this done now.

Last season's weather analysis seemed pretty popular, and I decided to continue it into the fall. There are two big changes from last time:

I added CTV Edmonton Weather as a sixth weather forecaster. Their system is a little bit different from the other five forecasters in the analysis so far; they only give probability of precipitation numbers up to four days in the future, but do a much longer range of temperature predictions. As a result their total score is only directly comparable to other stations for four out of six of the days predicted, as comparing a number to a rainy cloud icon isn't very fair statistically.
I changed the way POP scores are calculated. Previously I used a weird system that was more-or-less based on p-scores, but as soon as a station predicts 0% and it rains, or vice versa, their scores are shot. The new system is based on the Brier Score (a system that other people made up and actually use). In this case, a 0% prediction with rain still gives a score of 0, but it's averaged against other scores.

Anyway, the winner for the wonderfully balmy season of fall is: The Weather Network. (again!)

Scores for fall (out of 100):

Unfortunately for CTV, their score is a artificially lowered compared to everyone else due to the lack of precipitation forecast on the last two days. It isn't that significant of a penalty, though, as the 5th and 6th day forecasts are weighted the least. If we ignore them and only consider four days, their weighted score would become 65.18, much closer to the others.

It turns out that, with the change in POP scoring system, the numbers for fall are significantly lower than the previous numbers for summer. Take a look at this graph:

Not only do all forecasters do worse during the fall, but they also become less consistent with each other.

Some fun facts!

Best high temperature prediction: Weather Network 1-day prediction: 70.60%
Best low temperature prediction: Weather Network 1-day prediction: 78.38%
Best precipitation prediction: Weather Network 1-day prediction: 76.38

Worst high temperature prediction: Weather Network 6-day prediction: 45.70%
Worst low temperature prediction: Environment Canada 5-day prediction: 49.20%
Worst precipitation prediction: Environment Canada 6-day prediction: 54.28

Some graphs!

Again, CTV scores are only directly compared to the others for four days. What's cool to see is that almost all of the stations consistently lose accuracy the farther into the future they try to predict - which, of course, makes intuitive sense. If you're interested in more of a breakdown of how these scores were developed, you can check out these other graphs.

See ya at the end of winter!

Wednesday, December 12, 2012

Predicting SU Elections

Recently, individual bloggers like Nate Silver and Éric Grenier have gained massive (deserved) notoriety for developing statistical models that prove to be very accurate in predicting the outcomes of major elections. If you haven't heard of them I strongly urge you to check them out!

At the end of the last SU election, I posted a very simple regression analysis of a short list of election statistics describing the executive elections on campus. Since then I've added to my model, and I have a good reason to believe it's been made much more accurate.

Spoiler alert: this post isn't going to have any spoilers. I'm not going to tell you any specific numbers. Sorry, potential candidates!

I considered a significant number of quantifiable parameters. I strictly chose to avoid anything subjective (like debate performance, quality of posters, how chatty they are when we hang out), and was able to break the parameters into three broad categories: popularity, experience, and campaigning.

Falling into these categories were measures like Facebook friends and interactions, number of years served on Students' Council or Faculty Associations, and amount of money spent or fines amassed during campaigning.

The coolest result of the analysis was the different impact of each factor. The lowest-weighted factors were the popularity factors (Facebook friends don't appear translate very easily into votes), and the most important factors actually fell under the experience category. This is actually kind of reassuring, especially as it appears to suggest that the elections may be a tiny bit less of a popularity contest than normally thought!

The current analysis uses the results of 30 candidates running for 12 positions over two years (I skipped 2010/2011 because the lack of contested races really messed things up). While this is by no means a conclusive sample size, the fact that the results are so consistent, even between the two years individually, is really promising. Take a look at this graph:

The graph shows the relationship between the predicted number of first-round votes from the model, and the actual number of first round votes from the election. If the model was perfect, all the points would fall on a perfectly straight diagonal line. As it is, they fall on a pretty great line - out of an ideal coefficient of determination of 1.0, the model yielded a result of 0.945. Also, it correctly predicted the winner of each race, which isn't too shabby. I'm personally pretty happy with that result!

So stay tuned during this year's election, because I'm going to try to use this model to predict some of the results. If that doesn't sound fun, then you need to work on your love of stats...

Thursday, December 6, 2012

Why Deal or no Deal is the Best Game Show Ever

Last year the Supreme Court of Sweden declared that poker was both a game of skill and chance. That's ridiculous - EVERY game falls somewhere on the spectrum of skill and chance (maybe apart from games with no skill like War). Even the best of board games depend on some dice rolling or card shuffling, and at some point superior skill can still be beaten by blind luck.

Game shows are similar. On the one hand are shows like Jeopardy where the relative skill of the contestants almost always gets reflected directly in the score, and in the middle we have games like Wheel of Fortune (where the wheel can kill you) and even Who Wants to be a Millionaire (where randomly assigned dollar values and different questions between contestants don't allow for direct skill comparisons).

At the other end of the spectrum (just before Million Dollar Heads or Tails) is Deal or no Deal. Contestants get to point to cases and randomly eliminate dollar values until they either get bribed off by a computer algorithm or stick with the last dollar value that's left. (I mean it's a valuable contribution to society... did that sound sarcastic or something?) Because the choices contestants make in eliminating suitcases are absolutely random, the only impact that contestants really have is when they're presented with the infamous question Deal or no Deal?

Deal!

The game is wonderful because it's essentially a game theory and economics puzzle all wrapped up in one! In fact, the bare-bones simplicity of the game has made it the subject of several research papers that shed some interesting light on the decision-making processes of the contestants. The game can be very easily divided up into four components:

1. The Host
The host is actually the least important part of the show. Literally he does nothing. Moving on...

2. The Suitcases/Models
The fundamental focus of the game, these attractive prospects drive the action and fascination of the audience for an hour at a time. The suitcases are important too. The game starts off with 26 suitcases which range in value from $0.01 to $1,000,000. Assuming any given contestant picked a suitcase at random and stuck with it until the end, then the average value to be won would be $131,477.50. That sounds pretty good, except that the distribution of prize values is so unevenly distributed that half of contestants would get less than $875.

Round one of the game involves picking and sitting on a suitcase, then arbitrarily eliminated six case values before any real decisions are made. Honestly - these choices can only be random and no strategy has any impact. Because of the number of cases being eliminated, by the time the contestant has to make a real choice the average prize money can vary from a worst-case scenario of $13,420.80 (median $350) to $170,916.30 (median $17,500). That's a massive difference, and it's the range of opportunities that contestants could be faced with before any sort of strategy could take effect.

3. The Dealer
After the contestants set themselves up arbitrarily for success or failure, a mysterious ~~algorithm~~ completely real human being offers to bribe the contestant out of the game. This ~~algorithm~~ shrewd businessman only really takes a small number of things into account: making the game entertaining for the audience, and losing as little money as possible.

Now obviously the game show guarantees that every contestant is going to walk away with some amount of money, and as we saw before if every contestant just held onto their suitcase then they'd each get about $130,000. If the show wanted to produce a given number of hours of material per season, then, they would lose the least amount of money if the contestants each played the game for long time. And what's the easiest way to keep people in? Offer them lousy deals.

In fact, an analysis of bank deal offers has shown exceptional cruelty in offering deals. After the first round, the deal that is offered is usually only about 10% of the average of the remaining suitcases. Nobody in their right mind would take that (then again, most people can't take the average of 20 dollar values in their heads and may not know how much they're being ripped off), and indeed nobody on the US version of the show took that deal. In fact, nobody took a deal at all until the offer was at least 50% of the value of the average remaining prizes, which never occurs until at least 20 of the cases have been opened (round 5 of the game). Shrewd indeed. They did, however, end up at around 95% of the value of remaining suitcases right at the end of the game if the contestants stuck around that long.

The only exception is that sometimes if someone does really really poorly on the initial rounds, the offer tends to be a little bit higher just to make the game less depressing.

Poor lad...

And finally...

4. The Player
The actual variable in the game, the contestants have the opportunity to just mess things at any point in the time. Before talking about what was observed, here's a riddle. What would you rather choose: a deal for $3,000 or a 50/50 chance between $1,000 and $5,000? What if instead it was a deal for $7.50 versus a 50/50 chance between $5 and $10, or a deal for $875,000 versus a 50/50 chance between $750,000 and $1,000,000?

In each case the deal was exactly the midpoint value between the 50/50 options, so the expected value for either choice is the same. The only difference between the two choices (from a rationality point of view) is the amount of risk a contestant is willing to tolerate - are they willing to accept a low amount of money for the chance to get a higher amount?

Personally I have a pretty low risk threshold - in fact, I'd quite likely be a terribly boring contestant on the show. What was interesting was that the choices of the contestants on the show depended much less on a rational analysis of the situation, and more on how well they'd done previously in the game.

For instance, candidates who had terrible luck in the game (and could theoretically be left with a $5 or $10 scenario) were the most risky - very few of them took the deal. This is partially due to the fact that quite frankly a $5 difference isn't really much of anything, but more generally also falls under the category of decision-making known as the break-even effect, where gamblers are more likely to choose options that have the possibility (however remote) to bring them above an arbitrary prize value, even if the expected return is low on the choice as a whole.

Contestants with about average luck (similar to a $1,000 or $5,000 scenario) were much less risky. They tended to sit at comfortable levels of money either way, and settling for a guaranteed money value was much more appealing than risking a couple thousand either way.

What was really interesting was that contestants with the best luck were almost as risky as the players with the worst. Even though the potential swing between choices was hundreds of thousands of dollars, most contestants again rejected the deals offered. This is explained economically by a decision-making process known as the house-money effect, where gamblers are more likely to exhibit risky behavior when they feel like they are playing with money that isn't theirs. Rationally, a contestant in this situation has a guaranteed $750,000, and is facing a choice between a deal of $125,000 or a 50/50 chance between $250,000 and $0, but they tend not to look at it that way.

So there you go! Deal or no Deal really is a fascinating show from a game theory and decision-making theory point of view.

PS A lot of people weren't so happy with the Monte Hall Problem I mentioned in a previous post. Superficially, the last rounds of Deal or no Deal may look a lot like the Monte Hall Problem - a contestant has a case (door) chosen, a third case (door) is eliminated, and then the contestant has the choice of switching their selected case (door) for the one that remains. Unlike the Monte Hall Problem, though, there is no advantage from switching in Deal or no Deal, because the one case that gets eliminated in the intermediate step runs the risk of being the highest-value case. Sorry to disappoint.

Extreme Enginerding

Labels