Thursday, February 27, 2014

SU Elections: Presidential Grammar

With one very tiny exception at the end, I'm not going to talk about the platforms of any candidates in this year's SU elections. I'm not a student anymore, and it's probably time that I leave things be.

That being said, I was reading some of the platforms for the presidential candidates, and I found the grammar too much to bear. For instance, this is a page from one candidate's platform I copied and commented on (click to zoom):

And here's one from another candidate:

Come on guys. Apostrophes are taught to children. Capitalization is usually for proper nouns. "High jacked" sounds like an adjective shopping list for bros at an Amsterdam gym.

Grammar aside, I have to take massive exception to this graph in one candidate's platform:

If I looked at that, and not the numbers, I'd think "Wow, international tuition is WAY higher than domestic tuition!"

(Aside: the domestic tuition in the source is actually only $5,269.20. That's sort of irrelevant though.)

What is going on in this graph? A quick math guess tells me that 19 thousand dollars is only about 3-4 times 5 thousand dollars (precise ratio: 3.55). This appears to be the heights of the circles in this graph. In other words, this graph could and ought to be presented like this:

Sure, this still looks bad, but not NEARLY as bad as the previous graph because we're not implicitly pretending that the area of the section is what's being compared. The original graph massively skews axes and subtly suggests that international tuition is about 13 times domestic tuition by using circle areas instead of bars. This is a technique covered on Chapter 6 of "How to Lie with Statistics", which is a wonderful read if you're into that kind of thing. If we were to be truly honest with this graph, it could look something like this:

This is admittedly far less alarming, but also less likely to mislead people.

I've said my bit. Now go have a fun campaign, and I'll hopefully get back to you with my model predictions next week!

Tuesday, February 25, 2014

Winter Olympics Predictions

The winter Olympics are over, which means that my productivity is back on the rise and my sense of nationalism has returned to normal levels.

One of the things I enjoy trying to do from time to time is developing predictions of sporting events, such as the NHL Playoffs. So when I heard that people were trying to predict the medal counts for the 2014 Sochi Olympics, naturally I became intrigued and tracked some of their results.

I found four different published predictions:
  • Infostrada Sports: These guys used results from "Olympics, World Championships, and World Cups (or equivalent)" since the 2010 Vancouver Olympics to develop a likely scenario for who would win in each event. Their model had different weights for the results, time since the event, and nature of the event. They only ranked the top 15 countries on their medal table, and it was last updated three days before the opening ceremonies.
  • Wall Street Journal: The prestigious journal interviewed experts and rated recent performances, and assigned probabilities to certain outcomes. They claim to have been accurate to "within a few medals" in the last two Olympics, but were actually just alright for the 2012 London games, and only good at predicting a few countries in Vancouver in 2010.
  • SportsMyriad: I think this is a blog? Either way it's a fun website if you like sports stats. No real idea where the stats came from (apart from the disclaimer "It'll change from injuries, form, whims, etc.").
  • Andreff & Andreff (2014): A working paper from the International Association of Sports Economists, and also posted to the Freakonomics blog, the authors correlated factors such as population, per-capita income, political regime, average snowfall, and number of ski resorts to try to determine the number of medals. This sort of approach has been used for summer games before (probably not with ski resorts as a major factor...), but apparently not for winter Olympics. These were the only guys to include upper and lower bounds on their predictions.
How did they all turn out? Sort of alright, I guess. Sort of.

The best prediction was by the Wall Street Journal, with a coefficient of determination of 0.77 for total medals, and 0.63 for golds (1 being perfect).

Notable exceptions were the Netherlands (getting double the expected medals - whoops) and South Korea (getting half their prediction), but otherwise things were pretty decent for the Wall Street Journal.

Next best was the SportsMyriad site, which came in only slightly behind at 0.75 for total medals, but less close at 0.58 for golds.

Andreff and Andreff were next up, with a coefficient of determination of 0.68 for total medals (their model didn't break it down into colours). They were the only group to include upper and lower bounds, which proved to be a bit silly since only 35% of countries fell within the bounds given to them. These guys were the most wrong about the Netherlands (they predicted very confidently they'd get 5-7 medals, instead they got 24).

InfoStrada was the furthest off, with a coefficient of determination of 0.22 for total medal count. This is a bit unfair of a direct comparison, though, as they only listed their top 15 countries, and the addition of 10 lower-performing countries would have likely bumped that up. Even on a comparison of their top 15 across all models, though, they still came last.

In general, the Olympics are tough to predict, for loads of reasons. Even the best in a sport don't win every event they compete in, and trying to predict the result of a single mogul run or figure skate performance is an exercise in futility. Team sports predictions are rough since the full teams only rarely play each other with the exact line-ups between Olympics, and occasionally Olympic berths are won by teams or athletes who don't even end up competing. Using socio-economic data is probably ok for getting a general picture of a country's winter abilities, but ignores the fact that sometimes people are just good at something despite their surroundings.

That being said, I admire the effort by these would-be predictors, and look forward to seeing how they do next time around!