Wednesday, August 27, 2014

Population of Canada by Longitude

A couple of months ago a friend of mine on Facebook shared a post with me that graphed the population of Canada by latitude. They also challenged me to come up with a similar graph of Canada's population, but by longitude.

And I promptly forgot. Until now!

Like the original post mentions, finding geographical data that matches up with population data from Statistics Canada is quite tough, largely because the postal code data is intellectual property of Canada Post, and they don't much like sharing. I managed to find the 2011 Census data sorted by Forward Sortation Area (the first three digits of your postal code), and the geographical data for all postal codes (which was an unreasonably large file), and combine the two to get a fairly precise view of the data. To make sure what I had was close enough to the original graph, I redid his work by latitude first:

Close enough. Around the north some things get wacky because postal codes are so large and we likely used different ways to approximate the centers of each FSA, but I'm still reasonably satisfied with the result.

It's a fun graph, and deservedly the original got a nice amount of HuffPo press. It's pretty weird to think that about half of Canada lives below the northern suburbs of Montreal, and only 31% of the country lives above the 49th parallel section of the border.

Sure, Canada's tall, but lets talk about how wide it is. It's really wide. It stretches from 52°37'W at Cape Spear to 141°0'W at Boundary Peak, which covers nearly a quarter of the longitudinal values on earth. Yay us.

If we do the same analysis as the previous graph, but for Longitude, we get the following (you can click on the image to zoom and enhance, spy-movie style!):

So really, nothing too surprising. The majority of people tend to live somewhere between Toronto and Québec City, and in both British Columbia and Alberta the major cities tend to fall more or less along the same line north-south.

I was planning on combining both maps into a generic heatmap for Canada, but then I stumbled on this, and it's way cooler than anything I'd have been able to do, so I'll just share it with you instead. Try not to get too mesmerized...

Friday, August 15, 2014

The Great Oven Mitt Review of 2014

So I've got some cool friends.

A little background: when I first moved into my apartment (one year ago today exactly!), I only remembered to buy oven mitts at the last minute. I grabbed the quickest and cheapest one I could, which was a single oven glove appropriately called the "Ove' Glove." The night I moved in, we made pizzas and the Ove' Glove was the source of much entertainment and complaints, as it appeared to be a very good conductor of heat instead of an insulator. Whoops.

To remedy this, a two weeks ago at my birthday party my friend Cassandra decided it would be a great gag gift idea for everyone to bring oven mitts for me. As a result, I am now in possession of 16 oven mitts, ranging in colours, sizes, and materials.

So on this, the anniversary of me moving in, I've decided to work with these oven mitts the way that I know best: test them and write a report.

My set-up was pretty basic - I made a system to hold the oven mitt a fixed distance away from a medium-heat element, stuck a meat thermometer inside, and took heat measurements for up to ten minutes. A check without oven mitts showed that this setup subjected the oven mitts to a temperature of approximately 70 degrees.

Test Setup (highly technical)
 So without further ado, I present to you my rankings for oven mitts from worst to best. Starting with:

#9 Blue Mitts of Death

  • Brand: Dollar Store
  • Value: $3
I suppose at this point it's worth pointing out exactly how I'm ranking them. First and foremost, I'm looking at how long it takes the mitts to actually burn you. According to this source, 55 degrees is hot enough to give second-degree burns after 17 seconds. Holding onto a 70 degree heat source, this means you'd burn your hand in less than 5 minutes using this oven mitt. Sure, that's not how normal people use these, but hey - you gotta compare them somehow. Since these have the highest potential for burning, I rate them the worst.

#8 Pink, Flowery, and Painful

  • Brand: Dollar Store
  • Value: $2.50
Though these are by far the prettiest, they're also quite deadly. If my hand had been in them for the experiment, I would have gotten a burn about 5:20 into the test. Not nice. It's also worth pointing out that throughout all tests, these oven mitts got the closest to 70 degrees (67.6 after 10 minutes). Yikes!

#7 Black Cuisinart

  • Brand: Cuisinart
  • Value: $15.95
Hilariously, I got these as a Christmas present from my parents before any of these birthday shenanigans went down. Sadly, they're also apparently the type of oven mitt that likes to burn your hand off. Their redeeming factor is that they only increased in temperature 1.2 degrees within the first 30 seconds of the test, which is more than enough time for most oven extractions. Would likely have burned my hand about 5:40 into the test.

#6 Green Silicone

  • Brand: Ming Wo (?)
  • Value: $9.99
Man, these silicone ones look so fancy, but really like burning your hands to crisps. This is very similar to #7, in that it has one of the lowest heat gradients at first, but by 7 minutes into the test would have made you very unhappy. I'm sure there's some materials science point to be made here, but that would involve actual science.

#5 Languages of Pain

  • Brand: Dollar Store
  • Value: $2.50
Awesomely enough, I got two pair of these for my birthday. These were pretty decent for a language lesson, but woulda burned your hands at about 7:20 into the test. An excellent example of Dollar Store quality oven mitts holding their own against their expensive counterparts though...

Here are pretty graphs of the worst five oven mitts:

Again, the pink and blue oven mitts both had high initial rates of heat pickup, and ended up with the highest heats (the ranking order is a bit different from the graph because I tested the pink one on a colder day. I know, terribly unscientific of me...). The silicone mitts did much better for the first two minutes, but then took on heat at a similar rate to everyone else. Tsk tsk. 

The rest of the mitts happily didn't ever hit 55 degrees within their tests, so I'll rank them based on their total heat gain over the 10 minutes:

#4 The Ove' Glove

  • Brand: No clue
  • Value: $18.99
In a stunning come-from-behind near-podium finish, the Ove' Glove turns out to be a contender! And if you don't believe me, check out this totally awesome super cool consumer video (sarcasm). The Ove' Glove gained heat at an average rate of 2.95 degrees per minute - not shabby!

#3 The Alien

  • Brand: Dollar Store
  • Value: $2
Put this sucker on your hand and you've got great alien chestburster puppet! Alternatively, use it to take hot things out of an oven and not burn yourself. By far the best bang for the buck, somehow it combines the silicone and fabric and makes a decent oven mitt, gaining an average 2.82 degrees per minute.

#2 Better Barbeque

  • Brand: CTG Brands
  • Value: Weight in gold?
Wowza. This one is hefty, basically goes up to my elbow, and can hold its heat, only gaining an average of 2.66 degrees per minute. Very nice. These also won the contest for lowest heat pick-up in the first minute, and didn't even register a temperature change until 30 seconds into the test. 

#1 President's Choice

  • Brand: PC
  • Value: 7 unicorn hairs?
These guys were the bomb, only gaining 2.61 degrees per minute. They're also flexible enough to use regularly, unlike the silicone ones. 
Graph of the top 4 oven mitts:

Again, some very smooth curves here. The CTG oven mitt was by far the steadiest heat increase, but lost out to the PC mitt over the full length of the test. I know that my ranking has been more-or-less arbitrary this whole time, but I'm comfortable with declaring the CTG mitt to be my favorite (because really, who uses a mitt for 10 minutes at a time?).

Thanks again to everyone for pitching in on the oven mitt present. I hope I've used them in an appropriate manner! 

Wednesday, August 13, 2014

Mud Heroes Aren't Normal

Last weekend I went ahead and did something I never thought I'd do: the Mud Hero race down in Red Deer.

Mud Hero, for those of you who are blissfully unaware, is a crazy obstacle course/race/endurance sport/mud bath and spa/general day of chaos that follows in the ever-growing trend of mud runs for the athletically-inclined. There are dozens of similar events to this around Canada each summer, and the Mud Hero appears to be one of the most popular with over twelve thousand participants over the three days of heroing in Red Deer last weekend.

The event attracted people from all backgrounds and fitness levels, and has likely inspired wonderful stories of perseverance, raising money for charity, and teamwork through adversity. To all this, I say nonsense. The most interesting part of the Mud Hero is the statistics, and, much as though readers of this have likely figured out already, the inescapable conclusion that Mud Heroes just aren't normal.

In an attempt to ostensibly appear as much like a legitimate race as possible, all participants were given timing chips to track their racing - and all results are posted online for people to show to their friends and family and brag about just how slowly they trudged through the muck. Since this involves thousands of numbers, it's pretty easy to salivate over the possible statistics of said numbers. So I did. Here's a graph of everyone who ran on the last day of the Alberta Mud Hero:

Right off the bat that may look quite like a normally-distributed bell curve - there's certainly a lovely peak right around the middle, and it tends to taper off at either end. The reported average time for the course was 1:25:22 (85 minutes), and that seems to be reasonably around the middle of the peak.

It isn't just enough to assume that that's a normal distribution though - a normal distribution is a rather precisely defined curve that doesn't necessarily include all bell-like shapes. The results from Sunday's Mud Hero had a mean of 85.37 minutes and a standard deviation of 29.90 minutes - as these are the two parameters you need to develop a normal distribution curve, we can compare Sunday's results to the normal assumption and get the following:

That's not really all that close at all. These are two bell curves that have the same mean and standard deviation, but are not identical, leading to the fun conclusion that Mud Hero runners are not normal (well, normally-distributed at least). Mud Heroes tend to be positively skewed (the mean is higher than the median), and have shorter and bounded tails.

This isn't really all that surprising - in fact a normal distribution would have been surprising as there are necessarily cut-offs to the data (nobody can do the race faster than 0 minutes, for instance), and it was a relatively short race. Often people tend to view all bell-shaped curves as normally-distributed, even though there are an incredible amount and diversity of probability distributions out there.

So Mud Heroes aren't normal. What else can we learn from the data? Fortunately the results are broken down into genders, ages, and hometowns, so let's look at those!

First of all, gender:

Fascinatingly enough (for an event whose purpose is explicitly to get dirty), women outnumbered men 2 to 1! That's pretty awesome. A quick Analysis of Variance test shows that the men were statistically significantly faster than the women were this time around though, which I suppose is the trend in races like this. Shame...

Bam. Age graph. I'm not entirely sure why the men aren't as consistent as they age, but then again, who is? (marriage joke)

And finally, home city:

Turns out there's no reasonable statistical difference between participants from Red Deer, Calgary, and Edmonton. These box plots for their results suggest they have almost identical distributions for time, and an ANOVA test suggests that they can all be considered to be drawn from the same population. So really, even though the average time for Calgarian Heroes was two minutes faster than Edmontonian heroes, it's not significant enough for them to brag. So ha!

All in all though, Mud Hero was definitely a fun experience. If you're looking for a good excuse to get tired and muddy, I'd highly recommend it for next year!

Friday, July 4, 2014

Optimizing your Coffee Fix

Canadians like their coffee. In fact, the average Canadian drinks 55% more coffee per day than the average American, and Canadians are ranked 9th in the world for overall coffee consumption. Because of this, it's hardly a surprise that coffee shops are literally all over the place in Canadian cities.

Pretend for a moment that you're out and about in Edmonton one day, and absolutely need your coffee fix. So badly, in fact, that you're only willing to travel the absolute shortest possible distance to your nearest Tim Hortons or Starbucks (Canada's two most popular coffee shop chains). If you made a map of the city based on where the nearest Tim Hortons is, it would look something like this:

Similarly, a map based on where the nearest Starbucks is would look like this:

(If your browser doesn't like Google maps, check out images of the maps for Tim Hortons and Starbucks. Please note that the maps are only as accurate as Google's knowledge of the world is.)

These are called Voronoi diagrams, which split up the city based on where the closest relevant coffee shop is. Each region corresponds to a single coffee shop, and everywhere within that region is closer to that coffee shop than any other.

It turns out that Edmonton has about one Tim Hortons for every 10,000 people, and about half as many Starbucks. Not too surprisingly, coffee chains tend to be clustered quite a bit downtown and near the U of A campus, leaving the industrial parks to the east rather desolate and missing out on a good brew. The extremely even distribution of Tim Hortons locations in Sherwood Park seems a bit too good to be true, though.

Apart from helping you out with your coffee purchasing optimization, Voronoi diagrams actually do have plenty of real uses. Diagrams like this were famously used in 1854 to show that residents who lived closest to one particular well were dying of cholera, which lead to the discovery that diseases can be spread by contaminating water.

In the case of coffee shops in Edmonton, they can be helpful in city planning or for businesses choosing where to establish new franchises. And, of course, if your caffeine priorities are straight, they can help you get the fastest fix.

Monday, May 19, 2014

Which Game Should You Win?

The other day I was thinking, “What game in a 7 game playoff series is most closely correlated to winning the series?” Fans obviously cheer hard for their team every game in the playoffs, but if they knew that winning a particular game gave their team the best chance to win the series, perhaps they’d pull out all the stops. So, with this interesting question in hand, I went to see what the data said.

Hockey Reference is a treasure trove of hockey statistics and historical data. In order to get some answers to my questions, I pulled all of the playoffs series going back to 1943 (the first year where each round was decided by a 7 game series). This got me a total data-set of 598 series, and a total of 3363 games (so much hockey!).
Now it was time to crunch the numbers. Since I now had this really cool data-set, I decided to calculate some interesting tidbits before answering my original question. The first of these tidbits was to see what percentage of the series ended in 4, 5, 6 or 7 games:

Another interesting tidbit is the idea of home ice advantage. Teams play 82 grueling hockey games throughout the regular season and once you’ve met the “make the playoffs” bar, the only other advantage to doing well in the regular season is the idea of home ice advantage. You would therefore hope that it actually is an advantage to your hockey, not just an advantage to your team’s owner for getting to host an extra game in his arena. Since 1943, the home team (defined as the team that hosted game 1) has won 64.5% of all playoff series. In addition to that, out of the 108 four-game series sweeps there have been, the home team won 81 (75%) of them. So I think it’s fair to say that home ice is an advantage.

Alright, enough beating around the proverbial bush - time to answer the question that took me down this rabbit hole. As I said, I wanted to know which game in a 7-game series has the highest correlation with winning the series. The obvious answer is game 7, considering 100% of the teams that won game 7 won the series. This isn't very interesting, though, so let’s dive in. The following chart shows the percentage of teams that won the given game who also won the series:

Well, this is awkward. That chart isn’t very interesting at all. Essentially the first 4 games are a toss-up (with Game 2 having a slight edge), and then we see the percentages go up for the elimination games as you would expect. Clearly if your team has a chance to end the series in a game, you should cheer as hard as possible for that to happen. I did a little more analysis however and found something interesting. Later in the series, the team that won Game 5 in a series that went past Game 5 won the series 60.6% of the time. But for Game 6, the team that won game 6 in a series that went to Game 7 only won the series only 47.9% of the time. So much for all that “momentum” talk.

In the end it is the humble opinion of this writer that winning any game in the playoffs is probably your best bet. However, if you’re looking for the most statistically advantageous games to win, it appears that Game 2 and Game 5 are the ones to win. 

More analysis to come so stay tuned...

Thursday, April 24, 2014

Open Letter to the Canadian Centre for Policy Alternatives

To: Ms. Kate McInturff
Senior Researcher, Canadian Centre for Policy Alternatives
Dear Ms. McInturff,
It is with concern that I read your most recent publication from the CCPA, "The Best and Worst Places to be a Woman in Canada". Your analysis looked at five categories for Canada's top 20 cities, normalized and ranked them, then averaged the rank for each to form a list for gender inequality across Canada.
Two weeks ago, I did an analysis where I looked at six factors for Canada's top 20 cities, normalized them, averaged the score for each, and ranked each city for zombie preparedness. My piece was intended as a joke but using real statistics, and I am extremely discouraged that what appears to be a similar level of statistical rigour was applied to your analysis as was applied to mine. Though I'm sure I agree in general with the principles behind your analysis, I have several specific concerns:
First of all, I would like to express some confusion as to how your final ranking was determined. You mention that:

The scores for the indicators in each category (i.e. health, education) are averaged to produce a final score for that category. Each indicator is given equal statistical weight in the calculation of the score for each category. The cities are then ranked according to their score. The overall ranking of the cities is produced by averaging their ranks in each category.
At first glance, this seems pretty straightforward. When I took a look at Appendix B, though, this didn't quite add up. Québec city is still very much in the lead (ranked 6th, 2nd, 3rd, 7th, and 8th; average: 5.2), but I'm confused about why Montréal (ranked 11th, 9th, 11th, 6th, and 7th; average: 8.8) is ranked above Sherbrooke (ranked 7th, 10th, 10th, 11th, and 2nd; average: 8.0). In fact, it looks like most of the cities ranked 3-9 are somewhat shuffled:
This is pretty minor in the long run, but I am curious about how you got this final result.
Secondly, though the concept of using similar indicators as the Gender Inequality Index and the Global Gender Gap Report certainly seems valid, weighting seventeen indicators in five categories equally before ranking the categories and averaging them ignores much of the analytics required to do a study like this justice. (As an aside, stating that something is "well-supported by medical research" without citing any research makes it tough to follow up on...)
An easy example of even weighting potentially causing issues is in the Education category. The female to male completion ratios are calculated from the National Housing Survey for High School, Apprenticeships, College, and University, then all four were averaged together to come up with an inequality score, which is then ranked for the full category.
Of the four indicators, apprenticeship rates are consistently the lowest for females. However, the fraction of all people that go into the trades at all is also very low, and varies more city-to-city than the female enrollment rate (with a coefficient of variation 1.75 times higher). As a result, weighting all four forms of education equally penalizes cities with low number of tradespeople in general by effectively giving the apprenticeship figure twice the influence it should.
If we account for this and weight each indicator based on its overall population size, it turns out that no city has fewer women than men in total education attainment at all. The cities most helped by weighting all forms of education the same were the cities with the highest number of people in the trades, which are Montreal, Quebec, and Sherbrooke. These cities drop to rank 17, 18, and 13 respectively for Education after accounting for this, and likely would have an effect on the conclusions drawn about Quebec as a province. 
Thirdly, the system of averaging rankings, though fun in a zombie analysis sort of way, both addresses all five categories completely equally and undermines the variation between cities within each category. Even without accounting for any changes in the Education category, it is a category with equal or nearly-equal participation between men and women. If all categories are treated equally and the ranking is the variable under consideration, then we are essentially saying that a city like Vancouver, which was ranked 15th in Education with a perfectly equal score of 1.00, is just as bad as a city like Oshawa, which was ranked 15th in Leadership with a score of 0.27.
Finally, I find the name of your paper to be alarming and misleading. Your report specifically (and in italics) reinforces that it examines "the gap between men and women, rather than overall levels of well-being." Saying that Edmonton is the worst place in Canada to be a woman sounds like a comment on well-being, especially after mentioning that it's a great place for median income, just not quite as awesome for women as it is for men. On the other hand, cities with dangerous crime rates might be considered great for women, as long as the crime is equally distributed. I guess "Gender Inequality Index of Canadian Cities" wasn't catchy enough.
It's time that gender equality statistics are taken more seriously than zombie statistics. I hope that future studies reflect this.
Michael Ross

Monday, April 7, 2014

Canadian Cities Most and Least Likely to Survive the Zombie Apocalypse

Last week I found this blog post, which ranks the US states based on how likely they are to survive a zombie apocalypse. As the post mentions, seeing as the zombie apocalypse is clearly unavoidable, it's important to plan ahead and learn where to be when it hits.

Canadian provinces are way too huge and there aren't quite enough of them to do quite the same sort of analysis north of the border. On the other hand, Canada still has plenty of cities, and seeing as two thirds of the country live in one of the 20 biggest cities, that could be a pretty good way of looking at things.

Instead of looking at 11 factors (ranging from number of veterans to number of triathletes), I looked at the following 6 factors:

Distance to Closest Military Base: Let's face it, when zombies come to get ya, you'll be hoping the military is close-by to help take care of things. Fortunately Canada has a ton of army, navy, and air force bases dotted around the country, but the cities closest to the bases are definitely more likely to handle their undead uprising.

Average Temperature: I'm not an expert, but I imagine if you're dead and frozen solid, you're less likely to be a threat than if you're dead and flexible. Fortunately, Canadian cities have fairly low average daily high temperatures!

Population Density: Zombie math is pretty simple: too many people + too small a space = brains. If you're trapped and surrounded by a lot of future-zombies you've got way worse chances than if you've got some space around ya.

Obesity Rate: This one's pretty straightforward - obese people make easy zombie targets. It's related to (though strangely not strongly correlated to):

Physical Activity: Rule #1 in Zombieland is "Cardio" for a good reason. More people who can escape zombies make for fewer zombies, which really is just better for everyone else.

Gun Ownership: Zombies don't like guns for exactly the same reason zombie apocalypse survivors love guns. Gun ownership data is unfortunately only available on a province-by-province basis, but it's hard to argue that the more guns that are around in a province the better equipped people are to handle the undead. [Edit: I had previously presented this number as guns per population - I actually used licenses per population.]

With all that said, here's the ranking of the best and worst Canadian cities to be in during a zombie apocalypse (overall score is out of 1.0):

Moral of the story:

  • Don't live in southern Ontario - it's a zombie playground. Ontarians don't have a lot of guns, southern Ontario is relatively warm, and there's really nothing special going on in terms of physical activity and obesity.
  • Do live in a provincial capital. They tend to have military bases, and more often than not are large with relatively low density (suburbia is way better for zombie defense than downtown, of course).
  • I'm proud of Edmonton. Good job, us.
  • Newfoundland has a lot of guns. This is probably worth following up on.
[Edit #2: Updated Toronto temperature data - I mistakenly used daily mean instead of daily high for Toronto.]