Thursday, December 20, 2012

Fall Weather

Hey there!

I realize that technically tomorrow is the end of fall, but seeing as some crazy people think the world's going to end then, I figured I'd get this done now.

Last season's weather analysis seemed pretty popular, and I decided to continue it into the fall. There are two big changes from last time:

  • I added CTV Edmonton Weather as a sixth weather forecaster. Their system is a little bit different from the other five forecasters in the analysis so far; they only give probability of precipitation numbers up to four days in the future, but do a much longer range of temperature predictions. As a result their total score is only directly comparable to other stations for four out of six of the days predicted, as comparing a number to a rainy cloud icon isn't very fair statistically.
  • I changed the way POP scores are calculated. Previously I used a weird system that was more-or-less based on p-scores, but as soon as a station predicts 0% and it rains, or vice versa, their scores are shot. The new system is based on the Brier Score (a system that other people made up and actually use). In this case, a 0% prediction with rain still gives a score of 0, but it's averaged against other scores. 
Anyway, the winner for the wonderfully balmy season of fall is: The Weather Network. (again!)

Scores for fall (out of 100):
Unfortunately for CTV, their score is a artificially lowered compared to everyone else due to the lack of precipitation forecast on the last two days. It isn't that significant of a penalty, though, as the 5th and 6th day forecasts are weighted the least. If we ignore them and only consider four days, their weighted score would become 65.18, much closer to the others.

It turns out that, with the change in POP scoring system, the numbers for fall are significantly lower than the previous numbers for summer. Take a look at this graph:

Not only do all forecasters do worse during the fall, but they also become less consistent with each other.

Some fun facts!

Best high temperature prediction: Weather Network 1-day prediction: 70.60%
Best low temperature prediction: Weather Network 1-day prediction: 78.38%
Best precipitation prediction: Weather Network 1-day prediction: 76.38

Worst high temperature prediction: Weather Network 6-day prediction: 45.70%
Worst low temperature prediction: Environment Canada 5-day prediction: 49.20%
Worst precipitation prediction: Environment Canada 6-day prediction: 54.28

Some graphs!
Again, CTV scores are only directly compared to the others for four days. What's cool to see is that almost all of the stations consistently lose accuracy the farther into the future they try to predict - which, of course, makes intuitive sense. If you're interested in more of a breakdown of how these scores were developed, you can check out these other graphs.



See ya at the end of winter!

Wednesday, December 12, 2012

Predicting SU Elections

Recently, individual bloggers like Nate Silver and Éric Grenier have gained massive (deserved) notoriety for developing statistical models that prove to be very accurate in predicting the outcomes of major elections. If you haven't heard of them I strongly urge you to check them out!

At the end of the last SU election, I posted a very simple regression analysis of a short list of election statistics describing the executive elections on campus. Since then I've added to my model, and I have a good reason to believe it's been made much more accurate.

Spoiler alert: this post isn't going to have any spoilers. I'm not going to tell you any specific numbers. Sorry, potential candidates!

I considered a significant number of quantifiable parameters. I strictly chose to avoid anything subjective (like debate performance, quality of posters, how chatty they are when we hang out), and was able to break the parameters into three broad categories: popularity, experience, and campaigning.

Falling into these categories were measures like Facebook friends and interactions, number of years served on Students' Council or Faculty Associations, and amount of money spent or fines amassed during campaigning.

The coolest result of the analysis was the different impact of each factor. The lowest-weighted factors were the popularity factors (Facebook friends don't appear translate very easily into votes), and the most important factors actually fell under the experience category. This is actually kind of reassuring, especially as it appears to suggest that the elections may be a tiny bit less of a popularity contest than normally thought!

The current analysis uses the results of 30 candidates running for 12 positions over two years (I skipped 2010/2011 because the lack of contested races really messed things up). While this is by no means a conclusive sample size, the fact that the results are so consistent, even between the two years individually, is really promising. Take a look at this graph:


The graph shows the relationship between the predicted number of first-round votes from the model, and the actual number of first round votes from the election. If the model was perfect, all the points would fall on a perfectly straight diagonal line. As it is, they fall on a pretty great line - out of an ideal coefficient of determination of 1.0, the model yielded a result of 0.945. Also, it correctly predicted the winner of each race, which isn't too shabby. I'm personally pretty happy with that result!

So stay tuned during this year's election, because I'm going to try to use this model to predict some of the results. If that doesn't sound fun, then you need to work on your love of stats...

Thursday, December 6, 2012

Why Deal or no Deal is the Best Game Show Ever

Last year the Supreme Court of Sweden declared that poker was both a game of skill and chance. That's ridiculous - EVERY game falls somewhere on the spectrum of skill and chance (maybe apart from games with no skill like War). Even the best of board games depend on some dice rolling or card shuffling, and at some point superior skill can still be beaten by blind luck.

Game shows are similar. On the one hand are shows like Jeopardy where the relative skill of the contestants almost always gets reflected directly in the score, and in the middle we have games like Wheel of Fortune (where the wheel can kill you) and even Who Wants to be a Millionaire (where randomly assigned dollar values and different questions between contestants don't allow for direct skill comparisons).

At the other end of the spectrum (just before Million Dollar Heads or Tails) is Deal or no Deal. Contestants get to point to cases and randomly eliminate dollar values until they either get bribed off by a computer algorithm or stick with the last dollar value that's left. (I mean it's a valuable contribution to society... did that sound sarcastic or something?) Because the choices contestants make in eliminating suitcases are absolutely random, the only impact that contestants really have is when they're presented with the infamous question Deal or no Deal?

Deal!
The game is wonderful because it's essentially a game theory and economics puzzle all wrapped up in one! In fact, the bare-bones simplicity of the game has made it the subject of several research papers that shed some interesting light on the decision-making processes of the contestants. The game can be very easily divided up into four components:

1. The Host
The host is actually the least important part of the show. Literally he does nothing. Moving on...

2. The Suitcases/Models
The fundamental focus of the game, these attractive prospects drive the action and fascination of the audience for an hour at a time. The suitcases are important too. The game starts off with 26 suitcases which range in value from $0.01 to $1,000,000. Assuming any given contestant picked a suitcase at random and stuck with it until the end, then the average value to be won would be $131,477.50. That sounds pretty good, except that the distribution of prize values is so unevenly distributed that half of contestants would get less than $875.

Round one of the game involves picking and sitting on a suitcase, then arbitrarily eliminated six case values before any real decisions are made. Honestly - these choices can only be random and no strategy has any impact. Because of the number of cases being eliminated, by the time the contestant has to make a real choice the average prize money can vary from a worst-case scenario of $13,420.80 (median $350) to $170,916.30 (median $17,500). That's a massive difference, and it's the range of opportunities that contestants could be faced with before any sort of strategy could take effect.

3. The Dealer
After the contestants set themselves up arbitrarily for success or failure, a mysterious algorithm completely real human being offers to bribe the contestant out of the game. This algorithm shrewd businessman only really takes a small number of things into account: making the game entertaining for the audience, and losing as little money as possible.

Now obviously the game show guarantees that every contestant is going to walk away with some amount of money, and as we saw before if every contestant just held onto their suitcase then they'd each get about $130,000. If the show wanted to produce a given number of hours of material per season, then, they would lose the least amount of money if the contestants each played the game for long time. And what's the easiest way to keep people in? Offer them lousy deals.

In fact, an analysis of bank deal offers has shown exceptional cruelty in offering deals. After the first round, the deal that is offered is usually only about 10% of the average of the remaining suitcases. Nobody in their right mind would take that (then again, most people can't take the average of 20 dollar values in their heads and may not know how much they're being ripped off), and indeed nobody on the US version of the show took that deal. In fact, nobody took a deal at all until the offer was at least 50% of the value of the average remaining prizes, which never occurs until at least 20 of the cases have been opened (round 5 of the game). Shrewd indeed. They did, however, end up at around 95% of the value of remaining suitcases right at the end of the game if the contestants stuck around that long.

The only exception is that sometimes if someone does really really poorly on the initial rounds, the offer tends to be a little bit higher just to make the game less depressing.


Poor lad...
 And finally...

4. The Player
The actual variable in the game, the contestants have the opportunity to just mess things at any point in the time. Before talking about what was observed, here's a riddle. What would you rather choose: a deal for $3,000 or a 50/50 chance between $1,000 and $5,000? What if instead it was a deal for  $7.50 versus a 50/50 chance between $5 and $10, or a deal for $875,000 versus a 50/50 chance between $750,000 and $1,000,000?

In each case the deal was exactly the midpoint value between the 50/50 options, so the expected value for either choice is the same. The only difference between the two choices (from a rationality point of view) is the amount of risk a contestant is willing to tolerate - are they willing to accept a low amount of money for the chance to get a higher amount?

Personally I have a pretty low risk threshold - in fact, I'd quite likely be a terribly boring contestant on the show. What was interesting was that the choices of the contestants on the show depended much less on a rational analysis of the situation, and more on how well they'd done previously in the game.

For instance, candidates who had terrible luck in the game (and could theoretically be left with a  $5 or $10 scenario) were the most risky - very few of them took the deal. This is partially due to the fact that quite frankly a $5 difference isn't really much of anything, but more generally also falls under the category of decision-making known as the break-even effect, where gamblers are more likely to choose options that have the possibility (however remote) to bring them above an arbitrary prize value, even if the expected return is low on the choice as a whole.

Contestants with about average luck (similar to a $1,000 or $5,000 scenario) were much less risky. They tended to sit at comfortable levels of money either way, and settling for a guaranteed money value was much more appealing than risking a couple thousand either way.

What was really interesting was that contestants with the best luck were almost as risky as the players with the worst. Even though the potential swing between choices was hundreds of thousands of dollars, most contestants again rejected the deals offered. This is explained economically by a decision-making process known as the house-money effect, where gamblers are more likely to exhibit risky behavior when they feel like they are playing with money that isn't theirs. Rationally, a contestant in this situation has a guaranteed $750,000, and is facing a choice between a deal of $125,000 or a 50/50 chance between $250,000 and $0, but they tend not to look at it that way.

So there you go! Deal or no Deal really is a fascinating show from a game theory and decision-making theory point of view.

PS A lot of people weren't so happy with the Monte Hall Problem I mentioned in a previous post. Superficially, the last rounds of Deal or no Deal may look a lot like the Monte Hall Problem - a contestant has a case (door) chosen, a third case (door) is eliminated, and then the contestant has the choice of switching their selected case (door) for the one that remains. Unlike the Monte Hall Problem, though, there is no advantage from switching in Deal or no Deal, because the one case that gets eliminated in the intermediate step runs the risk of being the highest-value case. Sorry to disappoint.

Friday, November 30, 2012

What Democracy Looks Like

A while ago I wrote a tongue-in-cheek post about what democracy looks like amid the Occupy Protests.

It was pretty silly. I promise from here on out to only be serious. I swear.

I also posted a while ago about different electoral systems that are in existence, including First Past the Post, Borda Count, and Instant Runoff Voting. The second two electoral systems rely on ranking candidates, while First Past the Post only depends on one candidate chosen per ballot. Because some of the lower rankings can take effect when counting votes, an election with first rank results that would cause a winner in a First Past the Post system may not necessarily cause the same winner in a Borda or Instant Runoff election.

With that being said, this is what Democracy looks like:


This is a ternary plot based on a three-candidate First Past the Post system. The bottom axis ranks the percentage of votes received by the reference candidate, and each other axis ranks the other two candidates, respectively. The colour of each triangle is the chance of that candidate winning, where green means 100% and red means 0%. The First Past the Post plot makes intuitive sense - as long as you have more votes than anyone else, you win, otherwise you lose. This graph is pretty straightforward, so let's move on.


This plot is based on Instant Runoff Voting. What's interesting here is that the middle has opened up a little bit - a candidate could have more votes than anyone else but lose as long as their opponents were close to each other. The outsides of the plot are still similar to the First Past the Post plot, though. An example of this could be a race where the vote is split 49%-46%-5% - First Past the Post would give the win to the first candidate, and almost every time the first candidate would win in an Instant Runoff Vote too, as the second candidate would need all of the third candidate's votes to win (which is unlikely). Once we get to the middle though, the fraction of vote needed by the second place candidate from the losing candidate is less, and so there's a range of voting combinations that give a candidate a chance of winning. This gets even more noticeable in...

The Borda Count! This was the count where you get a certain number of points for each first-place vote, fewer points for a second place vote, etc. Here the boundary is significantly blurred between each candidate's corner - in fact, it is statistically possible for a candidate to win the election even if they only have 16% of the first preference votes, provided their opponents split the difference and hand them a lot of second-round votes. Funky.

Two last fun graphs:

(You may have to click on it to expand it...)

This is the same Instant Runoff graph based on the vote distribution for each race in the 2012 SU elections. In each case the winner is circled with the appropriate vote distribution, at the point in the contest when only three candidate remained in the count. In each case, the winner was the candidate with more votes than any other candidate, but both Andy Cheema and Brent Kelly were located at a point where they had more than a 10% chance of losing still.

 
Last graph: This is a more visual interpretation of the 2012 Board of Governors race. After NotA gets eliminated, its vote share goes to 0%, which is shown by the arrow pointing to the second circle. In this case it looks like the NotA vote share was mostly split evenly, with a bit of favoritism towards Brent Kelly. As the NotA vote was a sizeable 18%, it could have potentially swung the election towards Rebecca Taylor. Cool visualization, eh?

Wednesday, November 21, 2012

Light Rail Transit

If I’m on a train going the speed of light and walk from the back to the front, what happens?

I recently got a fun question that highlights a seeming contradiction in what we’ve been taught in physics lessons. On the one hand we’ve been told that nothing can go faster than the speed of light, but on the other hand we’ve been told that speeds in the same direction can be added up. So what’s really going on here?

Normally if you’re riding a train going 100 km/h and you walk forward at 5 km/h along the train, those speeds can be added up, depending on the reference point you’re using. Someone sitting on the train would see you walking 5 km/h forward and the outside moving 100 km/h backwards, but someone sitting outside would see you as moving forward at 105 km/h relative to the ground. Similarly, a jet taking off from a moving aircraft carrier gets a helpful speed boost, and firing a gun from that jet massively increases the speed of the bullet.

The problem is that the speed of light is strange. In 1905 Albert Einstein suggested that the speed of light is independent of the speed of the light source and of the observer. That’s insane! No matter how quickly you’re moving towards or away from light, its speed will never change (but its colour will).
This also suggests that simply adding up speeds no longer works when we’re close to the speed of light. Fortunately a fun formula exists that relates speeds at relativity, and it looks something like this:




What’s fancy is that this equation gives the results we'd expect for speeds that are substantially slower than the speed of light, but that no matter how high the speeds are, the result of the equation is still less than light (as long as neither individually is faster than light).

Sadly, the question that’s been asked can’t be solved straight-up because nothing with mass can go faster than the speed of light (trust me, science has tried). But we can still try with something close enough.

The fastest speeds humans have achieved are at the Large Hadron Collider when a proton was accelerated to 99.9999991% the speed of light. That’s only about 10 km/h slower than the supposed universal speed limit. The fastest a human has ever run was Usain Bolt at 37 km/h. Using the equation from before, instead of topping out 27 km/h above the speed of light, Usain Bolt actually only ends up speeding up to 99.9999991000001% of the speed of light. Not really a lot of progress.

As a bonus answer, even though the speed of light isn’t exceeded in this example, other cool things happen to the train. A passenger on the train sees Bolt travelling at 37 km/h, but an observer watching outside would only see him moving one millimeter per hour faster than the speed the train is already moving. A messed-up consequence of this is that the train would appear to flatten to one ten-thousandth of its original length due to Lorentz Contraction. And if that doesn’t boggle your mind, nothing will.

Thursday, November 1, 2012

Sneaky Statistics

Statistics are cool. Statistics are your friend. If you treat them right, they'll love you forever and never lie to you.

The problem with statistics is that sometimes they're confusing, and people very frequently think they understand them better than they do. Because of this, it's really easy sometimes for people to lie by tricking you with stats. Meanies...

Here's a cool example from Wikipedia of statistics playing tricks with you. Pretend that you're a doctor and you're trying to figure out which treatment is better for curing kidney stones. You use both for a while and this is what you get:

Treatment A Treatment B
Small Stones 81/87 = 93% 234/270 = 87%
Large Stones 192/263 = 73% 55/80 = 69%
Total 273/350 = 78% 289/350 = 83%

At first glance this test may seem pretty fair - both treatments were used 350 times, so we can compare them, right? And it looks like Treatment B was better than Treatment A. Maybe we should use it? Sounds good!

But wait. When we break it down into small stones versus large stones, the story changes. In small stones, Treatment A is 6% better than Treatment B, and in large stones Treatment A is 4% better. That's crazy though - how can A be both better at treating small stones and better at treating large stones, but worse at treating both? Clearly evil forces are at work here.

Around this point it wouldn't be horrible for you to be confused about which treatment is actually better, and it turns out that this study was, in fact, not fair. Large stones had a lower rate of successful curing, and Treatment A was used more than three times more often for these stones. Similarly, the easier smaller stones were more often given Treatment B. This creates such an unbalanced weighting between the treatments and stones that when it's all added up Treatment B looks better.

This highlights two cool concepts in statistics. The first is Simpson's paradox, where the correlation observed in two separate groups is reversed when they are combined together. Obviously this could offer juicy opportunities to people with an agenda - a drug company representing either Treatment A or B could make a case that their drug is better, simply based on how they add the numbers up in the study.

The second is the confounding (or lurking) variable - a variable that wasn't originally accounted for that has an effect on both the dependent and independent variables in the study. A good example is as follows: a statistician could do a week-by-week analysis of human behavior on a beach, keeping track of both drownings and slurpee consumption. They might make the observation that in weeks with high slurpee consumption, more people drown, and someone could then declare that drinking slurpees increases the chance of drowning. 

Boy that would suck. As a researcher, you could probably even justify this a little - perhaps drinking slurpees fills you up or makes you lethargic, increasing your chance of drowning. However, a more likely explanation is to take something new into account: the season. People just plain eat drink more slurpees in the summer than the winter (unless they're me). People also go swimming more at beaches during the summer, increasing the chance of drowning. In this example, the season would be a lurking variable - it correlates with both previously-considered variables, and explains the phenomenon.

Similarly, in our kidney stone example a lurking variable could be the size of the stone. Doctors disproportionately used Treatment A more for large stones, and Treatment B more for small stones - at the same time, small stones were easier to cure than large stones. By not taking into account the effect of the stone size on the treatment distribution, we arrive at the paradox from before.

Funnily enough, Simpson's paradox occurs fairly frequently - in fact, statisticians have estimated that in any similar 2x2 table as in the kidney stone example, we'd expect about 1/60 of them to have some version of the paradox.

On famous example involved a sex discrimination lawsuit at Berkeley in 1973. The admission results from the six largest departments looked something like this:

Department Men Women
A 512/825 = 62% 89/108 = 82%
B 353/560 = 63% 17/25 = 68%
C 120/325 = 37% 202/593 = 34%
D 138/417 = 33% 131/375 = 35%
E 53/191 = 28% 142/393 = 24%
F 16/272 = 6% 24/341 = 7%

 When the total data was added up across all departments, though, the distribution was as follows:

Applicants Admitted
Men 8442 44%
Women 4321 35%

At first glance, it looks like a case of gender discrimination - nearly 10% more men were admitted across the board than women, and some people who felt cheated took it to court.


Looking at those six departments in the first table, though, shows something interesting - Departments A, B, and D where the most popular with the men, and the least popular for the women. In these, the women consistently were more likely to be admitted than men. On the other hand, Departments C and E were the most popular for the women, and they lost to the men. Unfortunately, the Departments most popular with women also had admission averages that were about half of the ones that the men chose.


In fact, a study of these results suggested that there was a "small but statistically significant bias in favor of women" in the admission process when examining all departments in question, and the lawsuit failed. The lurking variable in this case was the character of the departments themselves - men tended to go into studies that were more math-intense (engineering, science, etc.), which happened to have more room to accept students.


It's really important to keep concepts like this in mind when examining statistics. For instance, one has to be extremely careful performing direct comparisons of male versus female earnings to account for factors such as preference in employment - it's much better to compare across identical jobs than comparing aggregate numbers. Aggregate statistics in that case are only really good for highlighting disparity in employment distributions, not earning statistics. Similarly, the Berkeley sex bias case, while not showing a bias against admitting women into studies, highlighted a lack of female participation in programs involving math that was more indicative of early societal pressures than active denial.


One final word of caution regarding Simpson's paradox: due to its relative likelihood, it's not impossible to make it appear as though it is taking place when in fact it isn't. Breaking up applicants by department makes sense because each department's admissions process is hypothetically  independent of each other, but one could easily also break the applicants into groups based on eye colour, height, birth place, beer preference, favorite hockey team, or blog readership. Chances are that in any given group of people, there's a way of breaking the data up nearly arbitrarily that could result in such a paradox. So if you ever see crazy differences between aggregate results and group results, make sure to keep an eye out for any funny business!

Monday, October 29, 2012

I ain't afraid of no ghosts


What do the Loch Ness monster, crop circles, and spiritualism have in common? Apart from being more or less considered fringe beliefs that have been known to spark intense facebook battles, they were all actually started off as known hoaxes, but continue to thrive to this day.

The most famous Surgeon's Photo picture of the Loch Ness monster, for instance, was produced in 1934 as supposed conclusive proof of the monster, but was revealed by the photographer to be a toy submarine sixty years later. Crop circles were started by two British men in 1978 using only simple tools, who were only revealed in 1991 after their wives got suspicious as to why they were spending so much time together at night and driving so far. Fortunately, apart from a small amount of ruined crops, neither of these hoaxes has caused any real damage - in fact I'm sure the tourism industry of Northern Scotland doesn't mind the Nessie myth at all.

Like the other two examples, the entire practice of spiritualism was also popularized by a massive hoax. Sadly, it has caused real damage.

Spiritualism largely began in the 1840s when the three Fox sisters began touring the United States claiming to be able to communicate with ghosts through mysterious 'rapping' noises. They performed séances for hundreds of people at a time and started the entire practice of communicating with lost loved ones (for a fee, of course). By the time they came clean in 1888 with signed confessions explaining how they achieved their effects, it was too late - the spiritualism movement had taken off and imitators were performing across the world.

Nowadays we live in a world where a third of Americans believe in ghosts, and more than 20% of Americans believe that mediums can talk to them. And if that wasn't bad enough, the worst aspect of this is the massive industry that has sprung up based entirely around exploiting people who've lost loved ones.

Though it is perhaps impossible to ever prove that what a medium does is fake, wouldn't a good place to start be to show that anything they can do could also be done by someone who claims no such powers? You bet it is. Fortunately, that's already been done.

Two of the most common principles put forward as explanations are known as hot reading and cold reading. Hot reading is essentially straight-up cheating - the medium would know something about a deceased individual from beforehand, tells the bereaved about it, and moves on. This is often accomplished through interactions beforehand with the individuals, planting repeat customers in large audiences, or, in one famous faith-healing example, having individuals fill out information cards beforehand and having them read out to the presenter using an ear piece.

Cold reading is much more subtle, and often involves a bit of psychology and manipulation. Commonly an on-stage medium will start with exceptionally vague statements such as "I'm getting a George. Who's George?" and will toss it out to an audience. Chances are that in an audience of a couple hundred people, either someone there will be named George, or will know a dead person named George, and if they have already paid to be there and want to contact their relative, they'd think this 'spirit' is for them. If that member of the audience were to stand up and say "My dad was named George!" they have begun to supply the medium with information and have stepped into a trap.

Once the medium has a target, there are a great number of ways they can convince the victim that they're talking to a dead person. Based on simply looking at people and reading their body language, a medium could make a series of educated guesses - if someone's in their forties to sixties they've likely recently lost a parent and most certainly don't have living a grandparent, for instance. Guesses are often stated vaguely enough that they can be recovered from if they turn out to be wrong - in this video, for instance, a medium states that a man's recently-deceased mother comes across as nervous. When she gets no response from him, she recovers by saying "... which is very unusual for her" and continues on about how extroverted she was. There's no way the medium could lose that battle.

Often mediums will make statements that could apply to almost anyone but seem less general than they are - for instance "this individual had an accident involving water" or "this person often wouldn't get stuff done because they are frustrated by the idea of mediocrity and wearied by the idea of starting over." If neither of those could potentially apply to you or someone you know then you're in a very slim minority. Statements like this are known as Barnum Statements - they are often general statements true about most people, but in a certain context can seem very personal.

A willing participant in a medium reading will take a mix of hits and misses from both cold and hot reading and will often end up dropping the misses and exaggerating the hits. When they later tell their friends about the experience, the medium will come across as having amazing powers. A general statement at the beginning of "Who's George?" gets transformed into the medium knowing that your dad was named George, even if you were the one who supplied that information. This is part of a common feature of our brains known as the confirmation bias where we tend to forget evidence that counters our preconceived notions and only remember what supports the conclusions we've already formed.

How about televised showings of live readings? Witness testimony of these suggests that often hours of material will get edited down to a half-hour show in order to boost the apparent success rate of the medium. Cheating? Definitely.

A combination of confirmation bias and Barnum statements is behind lots of other creepy phenomena like psychics, horoscopes, palm-reading, and Tarot cards. Nobody minds being told that they're creative and enthusiastic, but also patient and reliable, even though those are from different Zodiac signs (so you couldn't possibly actually be both!). 

Another favorite method of talking to the dead is the use of Ouija boards. Like crop circles, they were also started one way and have taken off to ridiculous levels of popularity. In the case of Ouija boards, they were started off and patented as a toy in 1890. That doesn't stop modern day mediums from using them to contact the dead, though. Ostensibly the spirits take over people's hands and use them to spell out enigmatic messages 'from beyond'.

Sound spooky? Good thing science can explain it. The leading explanation is known as the ideomotor effect, which is a fancy term for 'subconscious movement'. A great example of this is if you were to close your eyes and really vividly imagine tying your shoes. For a large proportion of the population, your fingers will start to twitch (ask a friend to watch - your eyes are supposed to be closed here). In the case of Ouija boards, people tend to move the cursor on the board to the letters they expect to be revealed, but can be completely unaware of the fact that they were actively moving it.

So though it can't be necessarily disproved, there truly are completely rational explanations for the effects that mediums use to convince people they can talk with the dead. So for this Hallowe'en, things turn out to not be so spooky after all.

Friday, October 5, 2012

Summer Weather: Part 2

A quick update to my post from last time!

Last week I posted the summer numbers from my weather station analysis. At that time, the scores were (out of 100):
  • Weather Network: 66.92
  • Global Weather: 66.02
  • Weather Channel: 63.99
  • Environment Canada: 55.00
  • TimeandDate.com: 54.25
Based on the scoring system, a station could have gotten 100 if all of their temperature predictions were within three degrees of the actual weather, and the fraction of days with rain accurately matched the POP forecasts for every POP value (in increments of 10). A station could have gotten 0 only if their POP values were wildly inaccurate.

A better benchmark, though, is how well my system would have scored someone just guessing. That would potentially better demonstrate the effectiveness of weather forecasters.

Using historical data, I was able to create a "dummy" weather station that used previous years' averages to "forecast" the weather on a month-to-month basis. For example, every day in July was predicted to have a high of 23, a low of 12, and a POP of 60%.

The score obtained using this method? 38.12. In fact, the average temperature predictions were less accurate than every forecast in my model so far (just over 50% within three degrees), and the POP predictions was only better than half of the other stations' 5- or 6-day predictions*.

That's certainly encouraging! A weather station's forecasts even six days in the future are significantly better than the best educated guess you could make given historical data. So there you go - next time you criticize the meteorologist for being inaccurate, remember that actually, they're at least twice as good as you.

*: The method I use for scoring POP forecasts is perhaps objectively fair, but not very accommodating to different weather stations' reporting methods. Stations that give increments of 10% between 0 and 30 will necessarily do better than those who don't, even though a 10% POP forecast is more-or-less useless. I'm looking into ways to be a little bit more fair with this.

Wednesday, October 3, 2012

Mathematic Party Games

Go grab a calendar. Seriously, this will be way cooler with one. Bonus points if it’s for 2012. Got one? Trust me, you’ll want one for this. Still no? Fine, take this one. Open it up and follow along.

Riddle me this: What do 4-4, 6-6, 8-8, 10-10, and 12-12 have in common? Apart from being pairs of the same even number, that is. This is where your calendar comes in. Take a look at 4-4 (that is, April 4th). It was a Wednesday in 2012. How about 6-6 (June 6th)? Also a Wednesday. The same is true for August 8th, October 10th, and December 12th. If you don’t believe me you probably didn’t open up that calendar.

It turns our that these dates (4-4, 6-6, 8-8, 10-10, and 12-12) will always fall on the same key day within a year. In 2012 they are all Wednesdays, and in 2013 they will be Thursdays. This paves the way to a really cool party trick that I like to call “pretending to memorize a calendar”, where all you need to know is the key day for a given year. Have a friend pick a date – say the date the Mayans never said the world will end (December 21) – and in seconds you can tell them the day. In this case we know December is the 12th month, so 12-12 is a Wednesday. A week later is the 19th, the 20th is a Thursday, so the 21st is a Friday. It’s that easy!

But wait, there’s more! Those were just five easy-to-remember months – any chance there’s a similar pattern for the others? It turns out there is. The ninth of the fifth month and the fifth of the ninth month (May 9 and September 5) are also Wednesdays. That’s pretty easy to remember. Also, the eleventh of the seventh month and the seventh of the eleventh month (July 11 and November 7) are also Wednesdays. For those of you who like mnemonics, all it takes to remember that is the sentence: “I work nine to five at 7-11.”

That’s nine months of the year covered. Unfortunately March doesn’t have anything quite as easy to remember for it, but I happen to know that the nerdiest day of the year (Pi day – March 14) also falls on the same day as all these other key dates.

January and February are the only variable ones, because they have that leap day between them and other months. Fortunately enough, no matter what year it is the last day of February will also always be the same day of the week as our other key days, and we can work backwards from there. The only really tricky one is January, and even it has a pretty simple rule: three years out of four (non leap years), the 3rd of January is our key day (so a Thursday in 2013), and on the fourth year out of four (a leap year) it’s the 4th of January (Wednesday in 2012).

You are now literally seconds away from knowing the day of the week of any date for a given year – all you need to know is the one key day! Now go forth and impress people!

Quick recap for key days (within every year these dates will all fall on the same day of the week, no matter what):
January: Either the third (non-leap year) or fourth (leap year)
February: Last day of the month
March: Pi day! (fourteenth)
April: Fourth (even month)
May: Ninth (Nine to five mnemonic)
June: Sixth (even month)
July: Eleventh (7-11 mnemonic)
August: Eighth (even month)
September: Fifth
October: Tenth (even month)
November: Seventh
December: Twelfth (even month)

Thursday, September 27, 2012

Election Arithmetic

Pretend for a minute that you're a general of a vast army, and that you are in charge of defending your city from the ruthless Colonel Blotto. You know that you both have 1,000 troops, and there are ten battlefields between your two armies.

Assuming that whoever sends the most troops to a battlefield wins that battle, and whoever wins the most battles wins the war, what is the optimal distribution of troops in order to maximize your chances of winning? Think about it for a minute.

Certainly there are some very poor possible choices - for instance, allocating all 1,000 troops on one battlefield means that the best you can do is tie if your opponent does the exact same, and you lose under any other configuration. But is there a best strategy?

As you may have guessed, this is a classic and well-established game theory game (they're called games but are often not fun) known as the Blotto game. However, unlike many games there is no perfect strategy to this one. No matter what you do, your opponent can beat you if only they know your strategy. For instance, a strategy of evenly splitting 166 troops onto the first six battles and 0 on the remaining four gives you a very high chance of winning six of the fields, but is very easily countered by your opponent with troops to spare if they knew your plan.

This has led to a classification of a large series of games as "Blotto" games - there is an optimal strategy, but it is only identifiable after the fact (once you know your opponent's strategy). Another example of this type of game is rock-paper-scissors - your best bet is to pick randomly from any number of good strategies, but after the game is over both sides can identify the one strategy that could have beaten their opponent.

A wonderful paper published fairly recently hypothesized that the American elections may be able to be modeled as a Blotto game, but with a couple other parameters tossed in as well. A Blotto game is fairly similar to the electoral college system, they thought, because each side has a fixed amount of money to distribute across a 51 electoral colleges, with the winner being whoever can get at least 270 electoral college votes.

First, though, they had to consider how much strategic variables, which they chose to be polling numbers ten weeks before the election and campaign spending ratios, impacted the vote (if at all). They came up with three game possibilities: either the election could be modeled as a Lotto, Blotto, or Frontrunner game (which is, coincidentally, the name of the paper). In a Lotto game, knowing your opponent's strategy couldn't help you - much like in a lottery, you can't identify any strategy that could lead to victory beforehand, even though afterwards you could see where you went wrong. In a Frontrunner game, there is an identifiable connection between strategic variables and victory, but one side has such an insurmountable advantage that they cannot lose.

The paper analyzed the 1996 and 2000 American presidential elections, and determined that campaign spending per state did have a strong impact on winning or losing that state. They further determined that the 1996 election was a Frontrunner game - Clinton has so much money and such favorable polls that it was easy for him to pick a winning strategy, and that even if Dole had known that strategy he still couldn't have won.
The results from the 2008 election. In general you'd expect states up and to the right to be won by Democrats (due either to higher spending or higher poll numbers). This is indeed the case.

Much more interestingly, though, the 2000 election was in fact determined to be a Blotto game - in other words, Al Gore could have re-allocated his money in such a way that Bush would have lost the election. The authors of the paper estimated that he only had about a 4% chance of choosing the strategy at random, but it was still a possibility.

Fast-forward to 2012, and these results have an interesting impact on the upcoming presidential election. A glance at the current polling numbers shows that this election is nearly as tight as the 2000 election, suggesting that a Blotto-like model may turn out to be valid. Considering that Obama is currently leading the expected vote count as well as fundraising, though, it seems more likely than ever that Romney has a steep uphill climb ahead of him.

Monday, September 24, 2012

Summer Weather

Weather forecasting is insane.

As a career I couldn't even imagine how un-rewarding it is - you could pour hours and hours into developing new algorithms that only get tiny increases in accuracy due simply to the massive complexity of the system you're trying to model. When you're right people take you for granted, and when you're wrong you take a lot of blame.

That being said, a while ago I noticed that sometimes different weather forecasters will predict radically different weather for the same day, given the same data. Also, I noticed that on Monday the weather for the weekend could be substantially different than the forecast from Friday. These are all fair differences - tweaks to models could cause differences of opinions between meteorologists, and the closer your prediction is to when you make it, the more accurate we'd hope it would be.

I was curious as to how much of a change there would be, though, which is why I decided to keep track of it. Since the beginning of June I've kept track of the six-day forecasts for High temperature, Low temperature, and Probability of Precipitation for five different forecasting stations: timeanddate.com, Environment Canada, Global Weather, the Weather Network, and the Weather Channel. Environment Canada, Global, and the Weather Network were chosen based on the sites visited most frequently by myself and my friends, the Weather Channel was chosen as it is the basis of Yahoo! weather, and subsequently the commonly-used Apple weather app, and timeanddate.com was chosen because it's a large multinational site. All stations were chosen at the Edmonton downtown location, not the international airport, and data for predictions was collected between 11 and 12 am for consistency in comparison.

Now that summer's over, I have some preliminary results. And the winner (by a hair) is the Weather Network!

Score (out of 100):
  • Weather Network: 66.92
  • Global Weather: 66.02
  • Weather Channel: 63.99
  • Environment Canada: 55.00
  • TimeandDate.com: 54.25
The score is based on a weighted average that was more or less arbitrarily decided by me: each subsequent day in the future was weighted less (so that a prediction for tomorrow's weather is worth more than a prediction for next week's), and POP was worth more than the High prediction, which was in turn weighted more than the Low prediction.

Some fun facts!

Best High temperature prediction: Weather Channel 1-day prediction (96.79% within 3 degrees)
Best Low temperature prediction: Environment Canada 2-day prediction (96.07% within 3 degrees)
Best POP: Global 4-day prediction (p-value 0.346)

Worst High temperature prediction: TimeandDate 6-day prediction (55.20% within 3 degrees)
Worst Low temperature prediction: Global 6-day prediction (68.57% within 3 degrees)
Worst POP: TimeandDate 3-day prediction (p-value 0.038)

Some graphs!

Temperature score was based on the percentage of predictions that were within 3 degrees of the actual temperature. In general there was a very strong downward trend for the high temperature predictions - almost all stations had better than 95% accuracy at predicting tomorrow's weather, and they were all about 70% accurate at the weather a week from now. There was less of a trend noted for the low predictions, however those are typically less useful apart from determining the likelihood of frost.

The score for POP is based off the p-value for each category of prediction. In essence, I checked the number of days that a given station predicted a POP of 10%, and compared it to the fraction of days that it actually did rain for that prediction. This doesn't translate directly into an accuracy percentage, which is why I call them 'scores' instead (though if every category had precisely the incidence of rain as predicted, it would end up with a score of 100).


So there you go! Hopefully this helps you the next time you're planning a picnic (or whatever people check the weather for...).

Saturday, September 8, 2012

Higher Learning


Pick a professional career. Almost any will do. Now really take a moment to visualize how their job is done today - their tools, their projects, all sorts of the stuff that's required by their fields.

Now think about that same profession 300 years ago. Any chance it's changed?

Doctors have progressed from prescribing urine baths and bloodletting to minimally-invasive laparoscopic surgery. Engineers have moved from catapults to Mars rovers, and law enforcement has gone from corrupt court systems to advanced forensics (in most countries, at least). These and other similar advances have been undeniably amazing, and are largely responsible for our currently quality of life.

Sadly, though, one particularly glaring profession has been dragging its heels against this rapid change. Despite the enormous advances over the last hundreds of years, university education has been largely unchanged. For some reason, an overwhelming majority of university classes are still taught by packing vast numbers of students in a theater and being talked to for an hour. Tweed jackets have come and gone, and the use of powerpoint may have sped things up, but by and large the methods used to teach are virtually unchanged.

Courses are still taught largely by talking at students, assigning readings and homework, and then giving grades based on exams. Exams themselves are still mostly just forcing students to cram material for a couple days beforehand, then stuffing students in a room for two hours and making them answer random questions about the previous forty hours of lecture.

Why is this still the standard? How likely is it that, of all professions, how we teach people more or less peaked three hundred years ago? Why is it that the most common way of judging how well someone has learned is to cram them into a room and force them to recite things, and why on earth should the grading that results from that two hour test be worth up to 70% of their grade? The case has been made before that universities should get students focusing on learning how to learn, instead of what often seems to be the focus of getting students learning how to write exams, and I totally agree.

There have been some pretty exciting developments in expanding education options recently, though. For example, the Khan Academy has more than 3,300 video lessons and interactvie exercises covering math all the way from preschool arithmetic to first-year calculus, which are free for anyone to take. The Academy also covers basic sciences, humanities, and finance.

If that isn't what you're looking for, why not learn a language? The BBC offers free courses on all the major European languages, and essential phrases for 40 languages. If you're interested in something less structured, free lecture videos on hundreds of topics can be found anywhere online to anyone who's really interested.

What's particularly cool, though, are the opportunities that are becoming available for a more formalized education. Recently, three of the biggest names in education (MIT, Berkeley, and Harvard) joined together to offer advanced university courses for topics ranging from solid state chemistry to artificial intelligence. These even offer 'certificates of completion' - certainly worth putting on a resumé, even if they don't have quite the same weight as an official transcript. Registration for the courses is open right now, and I strongly suggest you take a look at what's being offered.

The advantage of this new-found variety in fairly high-level education options is that it may (hopefully) end up pushing the envelope for education options in universities. Many post-secondary programs are starting to allow more open-ended education, such as the option to learn by correspondence or online, and with so much knowledge so freely available we may yet see a change in the 'classical' approach to lectures.

And for those of you who truly are here to learn how to learn, I strongly suggest taking a look at some of the links - I can guarantee there's something out there you'll find fascinating.

Monday, August 20, 2012

Scientific Illiteracy

In 1998 British physician Andrew Wakefield was offered £400,000 to publish an article claiming the vaccine against Measles, Mumps, and Rubella could lead to autism. The article was funded by lawyers preparing for an anti-vaccine lawsuit, and despite the fact that his results could not be reproduced and were dismissed by an overwhelming majority of researchers, a massive and ultimately deadly panic swept across the world. Vaccination rates dropped across Europe opening the doors to completely preventable outbreaks, children were killed or permanently disabled, and an estimated $25 million in avoidable hospital bills were accumulated due to completely avoidable MMR outbreaks.

It ultimately took a long and unnecessary campaign by researchers and health professionals to re-prove the safety of the MMR vaccine, but the scientific agreement hasn't quite been translated to the public - in fact today about 48% of Americans either don't trust or are unsure about the safety of vaccines, largely based on that one fraudulent article.

Vaccines are a dramatic example of the consequences of a public that doesn't trust or doesn't understand science, but they are absolutely crucial to public safety. Not getting vaccinated is not only a hazard to yourself but also to everyone around you. Sadly modern outbreaks of completely preventable diseases, such as the recent Whooping Cough outbreak, are reminders of the impact of ignoring science.

Studies have shown, though, that in general the public has a very poor understanding of some fairly basic concepts. Take for example these survey results on scientific literacy:
  • 6% of Americans don't believe smoking can cause lung cancer
  • 13% don't know that plants produce the oxygen we breathe
  • 20% aren't aware the center of the earth is very hot (and are presumably very confused by volcanoes)
  • 25% still think the sun goes around the earth
  • 46% don't know it takes the earth a year to orbit the sun (but were probably still stumped by the previous question), and
  • 52% fell hard for the Flintstones and believe dinosaurs and humans coexisted
Admittedly none of these specific misconceptions of science are likely to be dangerous to an individual, apart from perhaps lung cancer and smoking. What's instead frightening is that these are all concepts that are taught during or before high school, and suggest a public that is largely ignorant or apathetic to some of the most fundamental concepts we rely on. Even more horrifying is that these percentages have tended to only get worse between 2001 and 2010.

Regardless, though, of whether the misunderstanding of science is harmful on an individual basis or not, this attitude towards science of either apathy or automatic distrust is very dangerous for society. Distortion of science for personal or political benefit is very common, and has even been used explicitly to cause harm.

Two particular government exploitations or distortions of scientific understanding come to mind. The eugenics movement in the early 20th century claimed that human breeding needed to be controlled in order to advance our evolution, and was pushed ferociously by political groups and individuals around the world. Even countries like Canada and the United States got caught up in the movement, with individuals like Tommy Douglas and Alexander Graham Bell advocating for restrictions on who could marry and have children, and certain provinces and states forcibly sterilized individuals who were considered unfit to breed. The movement ultimately led to the rationalization of murders in the name of cleansing in Nazi Germany. It ultimately took the combination of the end of World War II and the further understanding of genetics to bring about the end of the vast majority of eugenic based programs, but not after massive personal and societal loss.

On the other hand, Soviet Russia took control of science and dismissed genetics entirely as a "bourgeois pseudoscience", instead adopting the practices of Lysenkoism for agricultural development. This explicit adoption of absolutely useless techniques held back Russian understanding of genetics for decades, and resulted in the firing, imprisonment, and execution of legitimate Russian scientists.

But misunderstanding of science still harms us daily. Despite court findings of fraud, millions of dollars a year of "ion bracelets" are still being sold by pretending to be scientific. Fictionalized versions of polygraph tests have led to the belief that they're foolproof - and private polygraph examiners have likely been responsible for propagating actual lies - even though psychologists have determined that they really aren't any better than guessing. Some people will often reach out to homeopathy at the expense of medically-proven treatments even though it's been consistently debunked by doctors.

Fortunately it's not all bad. The public acknowledgement of Canadian science journalists like Jay Ingram and Bob McDonald with the Order of Canada recently was an important step for supporting the field, which is undeniably important for keeping people informed, and televised outreach through the Discovery Channel and shows like Mythbusters has done a lot for increasing interest in science topics and critical thinking. Hopefully as science continues to advance we will see fewer opportunities for people to take advantage of people's misunderstanding of it, and more opportunities to get people interested and engaged. We definitely need it.

This was also posted on TheWandererOnline with graphics by Michelle Weremczuk. Check it out!