Extreme Enginerding

Thursday, November 1, 2012

Sneaky Statistics

Statistics are cool. Statistics are your friend. If you treat them right, they'll love you forever and never lie to you.

The problem with statistics is that sometimes they're confusing, and people very frequently think they understand them better than they do. Because of this, it's really easy sometimes for people to lie by tricking you with stats. Meanies...

Here's a cool example from Wikipedia of statistics playing tricks with you. Pretend that you're a doctor and you're trying to figure out which treatment is better for curing kidney stones. You use both for a while and this is what you get:

	Treatment A	Treatment B
Small Stones	81/87 = 93%	234/270 = 87%
Large Stones	192/263 = 73%	55/80 = 69%
Total	273/350 = 78%	289/350 = 83%

At first glance this test may seem pretty fair - both treatments were used 350 times, so we can compare them, right? And it looks like Treatment B was better than Treatment A. Maybe we should use it? Sounds good!

But wait. When we break it down into small stones versus large stones, the story changes. In small stones, Treatment A is 6% better than Treatment B, and in large stones Treatment A is 4% better. That's crazy though - how can A be both better at treating small stones and better at treating large stones, but worse at treating both? Clearly evil forces are at work here.

Around this point it wouldn't be horrible for you to be confused about which treatment is actually better, and it turns out that this study was, in fact, not fair. Large stones had a lower rate of successful curing, and Treatment A was used more than three times more often for these stones. Similarly, the easier smaller stones were more often given Treatment B. This creates such an unbalanced weighting between the treatments and stones that when it's all added up Treatment B looks better.

This highlights two cool concepts in statistics. The first is Simpson's paradox, where the correlation observed in two separate groups is reversed when they are combined together. Obviously this could offer juicy opportunities to people with an agenda - a drug company representing either Treatment A or B could make a case that their drug is better, simply based on how they add the numbers up in the study.

The second is the confounding (or lurking) variable - a variable that wasn't originally accounted for that has an effect on both the dependent and independent variables in the study. A good example is as follows: a statistician could do a week-by-week analysis of human behavior on a beach, keeping track of both drownings and slurpee consumption. They might make the observation that in weeks with high slurpee consumption, more people drown, and someone could then declare that drinking slurpees increases the chance of drowning.

Boy that would suck. As a researcher, you could probably even justify this a little - perhaps drinking slurpees fills you up or makes you lethargic, increasing your chance of drowning. However, a more likely explanation is to take something new into account: the season. People just plain eat drink more slurpees in the summer than the winter (unless they're me). People also go swimming more at beaches during the summer, increasing the chance of drowning. In this example, the season would be a lurking variable - it correlates with both previously-considered variables, and explains the phenomenon.

Similarly, in our kidney stone example a lurking variable could be the size of the stone. Doctors disproportionately used Treatment A more for large stones, and Treatment B more for small stones - at the same time, small stones were easier to cure than large stones. By not taking into account the effect of the stone size on the treatment distribution, we arrive at the paradox from before.

Funnily enough, Simpson's paradox occurs fairly frequently - in fact, statisticians have estimated that in any similar 2x2 table as in the kidney stone example, we'd expect about 1/60 of them to have some version of the paradox.

On famous example involved a sex discrimination lawsuit at Berkeley in 1973. The admission results from the six largest departments looked something like this:

Department	Men	Women
A	512/825 = 62%	89/108 = 82%
B	353/560 = 63%	17/25 = 68%
C	120/325 = 37%	202/593 = 34%
D	138/417 = 33%	131/375 = 35%
E	53/191 = 28%	142/393 = 24%
F	16/272 = 6%	24/341 = 7%

When the total data was added up across all departments, though, the distribution was as follows:

	Applicants	Admitted
Men	8442	44%
Women	4321	35%

At first glance, it looks like a case of gender discrimination - nearly 10% more men were admitted across the board than women, and some people who felt cheated took it to court.

Looking at those six departments in the first table, though, shows something interesting - Departments A, B, and D where the most popular with the men, and the least popular for the women. In these, the women consistently were more likely to be admitted than men. On the other hand, Departments C and E were the most popular for the women, and they lost to the men. Unfortunately, the Departments most popular with women also had admission averages that were about half of the ones that the men chose.

In fact, a study of these results suggested that there was a "small but statistically significant bias in favor of women" in the admission process when examining all departments in question, and the lawsuit failed. The lurking variable in this case was the character of the departments themselves - men tended to go into studies that were more math-intense (engineering, science, etc.), which happened to have more room to accept students.

It's really important to keep concepts like this in mind when examining statistics. For instance, one has to be extremely careful performing direct comparisons of male versus female earnings to account for factors such as preference in employment - it's much better to compare across identical jobs than comparing aggregate numbers. Aggregate statistics in that case are only really good for highlighting disparity in employment distributions, not earning statistics. Similarly, the Berkeley sex bias case, while not showing a bias against admitting women into studies, highlighted a lack of female participation in programs involving math that was more indicative of early societal pressures than active denial.

One final word of caution regarding Simpson's paradox: due to its relative likelihood, it's not impossible to make it appear as though it is taking place when in fact it isn't. Breaking up applicants by department makes sense because each department's admissions process is hypothetically independent of each other, but one could easily also break the applicants into groups based on eye colour, height, birth place, beer preference, favorite hockey team, or blog readership. Chances are that in any given group of people, there's a way of breaking the data up nearly arbitrarily that could result in such a paradox. So if you ever see crazy differences between aggregate results and group results, make sure to keep an eye out for any funny business!

Monday, October 29, 2012

I ain't afraid of no ghosts

What do the Loch Ness monster, crop circles, and spiritualism have in common? Apart from being more or less considered fringe beliefs that have been known to spark intense facebook battles, they were all actually started off as known hoaxes, but continue to thrive to this day.

The most famous Surgeon's Photo picture of the Loch Ness monster, for instance, was produced in 1934 as supposed conclusive proof of the monster, but was revealed by the photographer to be a toy submarine sixty years later. Crop circles were started by two British men in 1978 using only simple tools, who were only revealed in 1991 after their wives got suspicious as to why they were spending so much time together at night and driving so far. Fortunately, apart from a small amount of ruined crops, neither of these hoaxes has caused any real damage - in fact I'm sure the tourism industry of Northern Scotland doesn't mind the Nessie myth at all.

Like the other two examples, the entire practice of spiritualism was also popularized by a massive hoax. Sadly, it has caused real damage.

Spiritualism largely began in the 1840s when the three Fox sisters began touring the United States claiming to be able to communicate with ghosts through mysterious 'rapping' noises. They performed séances for hundreds of people at a time and started the entire practice of communicating with lost loved ones (for a fee, of course). By the time they came clean in 1888 with signed confessions explaining how they achieved their effects, it was too late - the spiritualism movement had taken off and imitators were performing across the world.

Nowadays we live in a world where a third of Americans believe in ghosts, and more than 20% of Americans believe that mediums can talk to them. And if that wasn't bad enough, the worst aspect of this is the massive industry that has sprung up based entirely around exploiting people who've lost loved ones.

Though it is perhaps impossible to ever prove that what a medium does is fake, wouldn't a good place to start be to show that anything they can do could also be done by someone who claims no such powers? You bet it is. Fortunately, that's already been done.

Two of the most common principles put forward as explanations are known as hot reading and cold reading. Hot reading is essentially straight-up cheating - the medium would know something about a deceased individual from beforehand, tells the bereaved about it, and moves on. This is often accomplished through interactions beforehand with the individuals, planting repeat customers in large audiences, or, in one famous faith-healing example, having individuals fill out information cards beforehand and having them read out to the presenter using an ear piece.

Cold reading is much more subtle, and often involves a bit of psychology and manipulation. Commonly an on-stage medium will start with exceptionally vague statements such as "I'm getting a George. Who's George?" and will toss it out to an audience. Chances are that in an audience of a couple hundred people, either someone there will be named George, or will know a dead person named George, and if they have already paid to be there and want to contact their relative, they'd think this 'spirit' is for them. If that member of the audience were to stand up and say "My dad was named George!" they have begun to supply the medium with information and have stepped into a trap.

Once the medium has a target, there are a great number of ways they can convince the victim that they're talking to a dead person. Based on simply looking at people and reading their body language, a medium could make a series of educated guesses - if someone's in their forties to sixties they've likely recently lost a parent and most certainly don't have living a grandparent, for instance. Guesses are often stated vaguely enough that they can be recovered from if they turn out to be wrong - in this video, for instance, a medium states that a man's recently-deceased mother comes across as nervous. When she gets no response from him, she recovers by saying "... which is very unusual for her" and continues on about how extroverted she was. There's no way the medium could lose that battle.

Often mediums will make statements that could apply to almost anyone but seem less general than they are - for instance "this individual had an accident involving water" or "this person often wouldn't get stuff done because they are frustrated by the idea of mediocrity and wearied by the idea of starting over." If neither of those could potentially apply to you or someone you know then you're in a very slim minority. Statements like this are known as Barnum Statements - they are often general statements true about most people, but in a certain context can seem very personal.

A willing participant in a medium reading will take a mix of hits and misses from both cold and hot reading and will often end up dropping the misses and exaggerating the hits. When they later tell their friends about the experience, the medium will come across as having amazing powers. A general statement at the beginning of "Who's George?" gets transformed into the medium knowing that your dad was named George, even if you were the one who supplied that information. This is part of a common feature of our brains known as the confirmation bias where we tend to forget evidence that counters our preconceived notions and only remember what supports the conclusions we've already formed.

How about televised showings of live readings? Witness testimony of these suggests that often hours of material will get edited down to a half-hour show in order to boost the apparent success rate of the medium. Cheating? Definitely.

A combination of confirmation bias and Barnum statements is behind lots of other creepy phenomena like psychics, horoscopes, palm-reading, and Tarot cards. Nobody minds being told that they're creative and enthusiastic, but also patient and reliable, even though those are from different Zodiac signs (so you couldn't possibly actually be both!).

Another favorite method of talking to the dead is the use of Ouija boards. Like crop circles, they were also started one way and have taken off to ridiculous levels of popularity. In the case of Ouija boards, they were started off and patented as a toy in 1890. That doesn't stop modern day mediums from using them to contact the dead, though. Ostensibly the spirits take over people's hands and use them to spell out enigmatic messages 'from beyond'.

Sound spooky? Good thing science can explain it. The leading explanation is known as the ideomotor effect, which is a fancy term for 'subconscious movement'. A great example of this is if you were to close your eyes and really vividly imagine tying your shoes. For a large proportion of the population, your fingers will start to twitch (ask a friend to watch - your eyes are supposed to be closed here). In the case of Ouija boards, people tend to move the cursor on the board to the letters they expect to be revealed, but can be completely unaware of the fact that they were actively moving it.

So though it can't be necessarily disproved, there truly are completely rational explanations for the effects that mediums use to convince people they can talk with the dead. So for this Hallowe'en, things turn out to not be so spooky after all.

Friday, October 5, 2012

Summer Weather: Part 2

A quick update to my post from last time!

Last week I posted the summer numbers from my weather station analysis. At that time, the scores were (out of 100):

Weather Network: 66.92
Global Weather: 66.02
Weather Channel: 63.99
Environment Canada: 55.00
TimeandDate.com: 54.25

Based on the scoring system, a station could have gotten 100 if all of their temperature predictions were within three degrees of the actual weather, and the fraction of days with rain accurately matched the POP forecasts for every POP value (in increments of 10). A station could have gotten 0 only if their POP values were wildly inaccurate.

A better benchmark, though, is how well my system would have scored someone just guessing. That would potentially better demonstrate the effectiveness of weather forecasters.

Using historical data, I was able to create a "dummy" weather station that used previous years' averages to "forecast" the weather on a month-to-month basis. For example, every day in July was predicted to have a high of 23, a low of 12, and a POP of 60%.

The score obtained using this method? 38.12. In fact, the average temperature predictions were less accurate than every forecast in my model so far (just over 50% within three degrees), and the POP predictions was only better than half of the other stations' 5- or 6-day predictions*.

That's certainly encouraging! A weather station's forecasts even six days in the future are significantly better than the best educated guess you could make given historical data. So there you go - next time you criticize the meteorologist for being inaccurate, remember that actually, they're at least twice as good as you.

*: The method I use for scoring POP forecasts is perhaps objectively fair, but not very accommodating to different weather stations' reporting methods. Stations that give increments of 10% between 0 and 30 will necessarily do better than those who don't, even though a 10% POP forecast is more-or-less useless. I'm looking into ways to be a little bit more fair with this.

Wednesday, October 3, 2012

Mathematic Party Games

Go grab a calendar. Seriously, this will be way cooler with one. Bonus points if it’s for 2012. Got one? Trust me, you’ll want one for this. Still no? Fine, take this one. Open it up and follow along.

Riddle me this: What do 4-4, 6-6, 8-8, 10-10, and 12-12 have in common? Apart from being pairs of the same even number, that is. This is where your calendar comes in. Take a look at 4-4 (that is, April 4th). It was a Wednesday in 2012. How about 6-6 (June 6th)? Also a Wednesday. The same is true for August 8th, October 10th, and December 12th. If you don’t believe me you probably didn’t open up that calendar.

It turns our that these dates (4-4, 6-6, 8-8, 10-10, and 12-12) will always fall on the same key day within a year. In 2012 they are all Wednesdays, and in 2013 they will be Thursdays. This paves the way to a really cool party trick that I like to call “pretending to memorize a calendar”, where all you need to know is the key day for a given year. Have a friend pick a date – say the date the Mayans never said the world will end (December 21) – and in seconds you can tell them the day. In this case we know December is the 12th month, so 12-12 is a Wednesday. A week later is the 19th, the 20th is a Thursday, so the 21st is a Friday. It’s that easy!

But wait, there’s more! Those were just five easy-to-remember months – any chance there’s a similar pattern for the others? It turns out there is. The ninth of the fifth month and the fifth of the ninth month (May 9 and September 5) are also Wednesdays. That’s pretty easy to remember. Also, the eleventh of the seventh month and the seventh of the eleventh month (July 11 and November 7) are also Wednesdays. For those of you who like mnemonics, all it takes to remember that is the sentence: “I work nine to five at 7-11.”

That’s nine months of the year covered. Unfortunately March doesn’t have anything quite as easy to remember for it, but I happen to know that the nerdiest day of the year (Pi day – March 14) also falls on the same day as all these other key dates.

January and February are the only variable ones, because they have that leap day between them and other months. Fortunately enough, no matter what year it is the last day of February will also always be the same day of the week as our other key days, and we can work backwards from there. The only really tricky one is January, and even it has a pretty simple rule: three years out of four (non leap years), the 3rd of January is our key day (so a Thursday in 2013), and on the fourth year out of four (a leap year) it’s the 4th of January (Wednesday in 2012).

You are now literally seconds away from knowing the day of the week of any date for a given year – all you need to know is the one key day! Now go forth and impress people!

Quick recap for key days (within every year these dates will all fall on the same day of the week, no matter what):
January: Either the third (non-leap year) or fourth (leap year)
February: Last day of the month
March: Pi day! (fourteenth)
April: Fourth (even month)
May: Ninth (Nine to five mnemonic)
June: Sixth (even month)
July: Eleventh (7-11 mnemonic)
August: Eighth (even month)
September: Fifth
October: Tenth (even month)
November: Seventh
December: Twelfth (even month)

Thursday, September 27, 2012

Election Arithmetic

Pretend for a minute that you're a general of a vast army, and that you are in charge of defending your city from the ruthless Colonel Blotto. You know that you both have 1,000 troops, and there are ten battlefields between your two armies.

Assuming that whoever sends the most troops to a battlefield wins that battle, and whoever wins the most battles wins the war, what is the optimal distribution of troops in order to maximize your chances of winning? Think about it for a minute.

Certainly there are some very poor possible choices - for instance, allocating all 1,000 troops on one battlefield means that the best you can do is tie if your opponent does the exact same, and you lose under any other configuration. But is there a best strategy?

As you may have guessed, this is a classic and well-established game theory game (they're called games but are often not fun) known as the Blotto game. However, unlike many games there is no perfect strategy to this one. No matter what you do, your opponent can beat you if only they know your strategy. For instance, a strategy of evenly splitting 166 troops onto the first six battles and 0 on the remaining four gives you a very high chance of winning six of the fields, but is very easily countered by your opponent with troops to spare if they knew your plan.

This has led to a classification of a large series of games as "Blotto" games - there is an optimal strategy, but it is only identifiable after the fact (once you know your opponent's strategy). Another example of this type of game is rock-paper-scissors - your best bet is to pick randomly from any number of good strategies, but after the game is over both sides can identify the one strategy that could have beaten their opponent.

A wonderful paper published fairly recently hypothesized that the American elections may be able to be modeled as a Blotto game, but with a couple other parameters tossed in as well. A Blotto game is fairly similar to the electoral college system, they thought, because each side has a fixed amount of money to distribute across a 51 electoral colleges, with the winner being whoever can get at least 270 electoral college votes.

First, though, they had to consider how much strategic variables, which they chose to be polling numbers ten weeks before the election and campaign spending ratios, impacted the vote (if at all). They came up with three game possibilities: either the election could be modeled as a Lotto, Blotto, or Frontrunner game (which is, coincidentally, the name of the paper). In a Lotto game, knowing your opponent's strategy couldn't help you - much like in a lottery, you can't identify any strategy that could lead to victory beforehand, even though afterwards you could see where you went wrong. In a Frontrunner game, there is an identifiable connection between strategic variables and victory, but one side has such an insurmountable advantage that they cannot lose.

The paper analyzed the 1996 and 2000 American presidential elections, and determined that campaign spending per state did have a strong impact on winning or losing that state. They further determined that the 1996 election was a Frontrunner game - Clinton has so much money and such favorable polls that it was easy for him to pick a winning strategy, and that even if Dole had known that strategy he still couldn't have won.

The results from the 2008 election. In general you'd expect states up and to the right to be won by Democrats (due either to higher spending or higher poll numbers). This is indeed the case.

Much more interestingly, though, the 2000 election was in fact determined to be a Blotto game - in other words, Al Gore could have re-allocated his money in such a way that Bush would have lost the election. The authors of the paper estimated that he only had about a 4% chance of choosing the strategy at random, but it was still a possibility.

Fast-forward to 2012, and these results have an interesting impact on the upcoming presidential election. A glance at the current polling numbers shows that this election is nearly as tight as the 2000 election, suggesting that a Blotto-like model may turn out to be valid. Considering that Obama is currently leading the expected vote count as well as fundraising, though, it seems more likely than ever that Romney has a steep uphill climb ahead of him.

Monday, September 24, 2012

Summer Weather

Weather forecasting is insane.

As a career I couldn't even imagine how un-rewarding it is - you could pour hours and hours into developing new algorithms that only get tiny increases in accuracy due simply to the massive complexity of the system you're trying to model. When you're right people take you for granted, and when you're wrong you take a lot of blame.

That being said, a while ago I noticed that sometimes different weather forecasters will predict radically different weather for the same day, given the same data. Also, I noticed that on Monday the weather for the weekend could be substantially different than the forecast from Friday. These are all fair differences - tweaks to models could cause differences of opinions between meteorologists, and the closer your prediction is to when you make it, the more accurate we'd hope it would be.

I was curious as to how much of a change there would be, though, which is why I decided to keep track of it. Since the beginning of June I've kept track of the six-day forecasts for High temperature, Low temperature, and Probability of Precipitation for five different forecasting stations: timeanddate.com, Environment Canada, Global Weather, the Weather Network, and the Weather Channel. Environment Canada, Global, and the Weather Network were chosen based on the sites visited most frequently by myself and my friends, the Weather Channel was chosen as it is the basis of Yahoo! weather, and subsequently the commonly-used Apple weather app, and timeanddate.com was chosen because it's a large multinational site. All stations were chosen at the Edmonton downtown location, not the international airport, and data for predictions was collected between 11 and 12 am for consistency in comparison.

Now that summer's over, I have some preliminary results. And the winner (by a hair) is the Weather Network!

Score (out of 100):

Weather Network: 66.92
Global Weather: 66.02
Weather Channel: 63.99
Environment Canada: 55.00
TimeandDate.com: 54.25

The score is based on a weighted average that was more or less arbitrarily decided by me: each subsequent day in the future was weighted less (so that a prediction for tomorrow's weather is worth more than a prediction for next week's), and POP was worth more than the High prediction, which was in turn weighted more than the Low prediction.

Some fun facts!

Best High temperature prediction: Weather Channel 1-day prediction (96.79% within 3 degrees)
Best Low temperature prediction: Environment Canada 2-day prediction (96.07% within 3 degrees)
Best POP: Global 4-day prediction (p-value 0.346)

Worst High temperature prediction: TimeandDate 6-day prediction (55.20% within 3 degrees)
Worst Low temperature prediction: Global 6-day prediction (68.57% within 3 degrees)
Worst POP: TimeandDate 3-day prediction (p-value 0.038)

Some graphs!

Temperature score was based on the percentage of predictions that were within 3 degrees of the actual temperature. In general there was a very strong downward trend for the high temperature predictions - almost all stations had better than 95% accuracy at predicting tomorrow's weather, and they were all about 70% accurate at the weather a week from now. There was less of a trend noted for the low predictions, however those are typically less useful apart from determining the likelihood of frost.

The score for POP is based off the p-value for each category of prediction. In essence, I checked the number of days that a given station predicted a POP of 10%, and compared it to the fraction of days that it actually did rain for that prediction. This doesn't translate directly into an accuracy percentage, which is why I call them 'scores' instead (though if every category had precisely the incidence of rain as predicted, it would end up with a score of 100).

So there you go! Hopefully this helps you the next time you're planning a picnic (or whatever people check the weather for...).

Saturday, September 8, 2012

Higher Learning

Pick a professional career. Almost any will do. Now really take a moment to visualize how their job is done today - their tools, their projects, all sorts of the stuff that's required by their fields.

Now think about that same profession 300 years ago. Any chance it's changed?

Doctors have progressed from prescribing urine baths and bloodletting to minimally-invasive laparoscopic surgery. Engineers have moved from catapults to Mars rovers, and law enforcement has gone from corrupt court systems to advanced forensics (in most countries, at least). These and other similar advances have been undeniably amazing, and are largely responsible for our currently quality of life.

Sadly, though, one particularly glaring profession has been dragging its heels against this rapid change. Despite the enormous advances over the last hundreds of years, university education has been largely unchanged. For some reason, an overwhelming majority of university classes are still taught by packing vast numbers of students in a theater and being talked to for an hour. Tweed jackets have come and gone, and the use of powerpoint may have sped things up, but by and large the methods used to teach are virtually unchanged.

Courses are still taught largely by talking at students, assigning readings and homework, and then giving grades based on exams. Exams themselves are still mostly just forcing students to cram material for a couple days beforehand, then stuffing students in a room for two hours and making them answer random questions about the previous forty hours of lecture.

Why is this still the standard? How likely is it that, of all professions, how we teach people more or less peaked three hundred years ago? Why is it that the most common way of judging how well someone has learned is to cram them into a room and force them to recite things, and why on earth should the grading that results from that two hour test be worth up to 70% of their grade? The case has been made before that universities should get students focusing on learning how to learn, instead of what often seems to be the focus of getting students learning how to write exams, and I totally agree.

There have been some pretty exciting developments in expanding education options recently, though. For example, the Khan Academy has more than 3,300 video lessons and interactvie exercises covering math all the way from preschool arithmetic to first-year calculus, which are free for anyone to take. The Academy also covers basic sciences, humanities, and finance.

If that isn't what you're looking for, why not learn a language? The BBC offers free courses on all the major European languages, and essential phrases for 40 languages. If you're interested in something less structured, free lecture videos on hundreds of topics can be found anywhere online to anyone who's really interested.

What's particularly cool, though, are the opportunities that are becoming available for a more formalized education. Recently, three of the biggest names in education (MIT, Berkeley, and Harvard) joined together to offer advanced university courses for topics ranging from solid state chemistry to artificial intelligence. These even offer 'certificates of completion' - certainly worth putting on a resumé, even if they don't have quite the same weight as an official transcript. Registration for the courses is open right now, and I strongly suggest you take a look at what's being offered.

The advantage of this new-found variety in fairly high-level education options is that it may (hopefully) end up pushing the envelope for education options in universities. Many post-secondary programs are starting to allow more open-ended education, such as the option to learn by correspondence or online, and with so much knowledge so freely available we may yet see a change in the 'classical' approach to lectures.

And for those of you who truly are here to learn how to learn, I strongly suggest taking a look at some of the links - I can guarantee there's something out there you'll find fascinating.