Monday, April 7, 2014

Canadian Cities Most and Least Likely to Survive the Zombie Apocalypse

Last week I found this blog post, which ranks the US states based on how likely they are to survive a zombie apocalypse. As the post mentions, seeing as the zombie apocalypse is clearly unavoidable, it's important to plan ahead and learn where to be when it hits.

Canadian provinces are way too huge and there aren't quite enough of them to do quite the same sort of analysis north of the border. On the other hand, Canada still has plenty of cities, and seeing as two thirds of the country live in one of the 20 biggest cities, that could be a pretty good way of looking at things.

Instead of looking at 11 factors (ranging from number of veterans to number of triathletes), I looked at the following 6 factors:

Distance to Closest Military Base: Let's face it, when zombies come to get ya, you'll be hoping the military is close-by to help take care of things. Fortunately Canada has a ton of army, navy, and air force bases dotted around the country, but the cities closest to the bases are definitely more likely to handle their undead uprising.

Average Temperature: I'm not an expert, but I imagine if you're dead and frozen solid, you're less likely to be a threat than if you're dead and flexible. Fortunately, Canadian cities have fairly low average daily high temperatures!

Population Density: Zombie math is pretty simple: too many people + too small a space = brains. If you're trapped and surrounded by a lot of future-zombies you've got way worse chances than if you've got some space around ya.

Obesity Rate: This one's pretty straightforward - obese people make easy zombie targets. It's related to (though strangely not strongly correlated to):

Physical Activity: Rule #1 in Zombieland is "Cardio" for a good reason. More people who can escape zombies make for fewer zombies, which really is just better for everyone else.

Gun Ownership: Zombies don't like guns for exactly the same reason zombie apocalypse survivors love guns. Gun ownership data is unfortunately only available on a province-by-province basis, but it's hard to argue that the more guns that are around in a province the better equipped people are to handle the undead. [Edit: I had previously presented this number as guns per population - I actually used licenses per population.]

With all that said, here's the ranking of the best and worst Canadian cities to be in during a zombie apocalypse (overall score is out of 1.0):

Moral of the story:

  • Don't live in southern Ontario - it's a zombie playground. Ontarians don't have a lot of guns, southern Ontario is relatively warm, and there's really nothing special going on in terms of physical activity and obesity.
  • Do live in a provincial capital. They tend to have military bases, and more often than not are large with relatively low density (suburbia is way better for zombie defense than downtown, of course).
  • I'm proud of Edmonton. Good job, us.
  • Newfoundland has a lot of guns. This is probably worth following up on.
[Edit #2: Updated Toronto temperature data - I mistakenly used daily mean instead of daily high for Toronto.]

Thursday, February 27, 2014

SU Elections: Presidential Grammar

With one very tiny exception at the end, I'm not going to talk about the platforms of any candidates in this year's SU elections. I'm not a student anymore, and it's probably time that I leave things be.

That being said, I was reading some of the platforms for the presidential candidates, and I found the grammar too much to bear. For instance, this is a page from one candidate's platform I copied and commented on (click to zoom):

And here's one from another candidate:

Come on guys. Apostrophes are taught to children. Capitalization is usually for proper nouns. "High jacked" sounds like an adjective shopping list for bros at an Amsterdam gym.

Grammar aside, I have to take massive exception to this graph in one candidate's platform:

If I looked at that, and not the numbers, I'd think "Wow, international tuition is WAY higher than domestic tuition!"

(Aside: the domestic tuition in the source is actually only $5,269.20. That's sort of irrelevant though.)

What is going on in this graph? A quick math guess tells me that 19 thousand dollars is only about 3-4 times 5 thousand dollars (precise ratio: 3.55). This appears to be the heights of the circles in this graph. In other words, this graph could and ought to be presented like this:

Sure, this still looks bad, but not NEARLY as bad as the previous graph because we're not implicitly pretending that the area of the section is what's being compared. The original graph massively skews axes and subtly suggests that international tuition is about 13 times domestic tuition by using circle areas instead of bars. This is a technique covered on Chapter 6 of "How to Lie with Statistics", which is a wonderful read if you're into that kind of thing. If we were to be truly honest with this graph, it could look something like this:

This is admittedly far less alarming, but also less likely to mislead people.

I've said my bit. Now go have a fun campaign, and I'll hopefully get back to you with my model predictions next week!

Tuesday, February 25, 2014

Winter Olympics Predictions

The winter Olympics are over, which means that my productivity is back on the rise and my sense of nationalism has returned to normal levels.

One of the things I enjoy trying to do from time to time is developing predictions of sporting events, such as the NHL Playoffs. So when I heard that people were trying to predict the medal counts for the 2014 Sochi Olympics, naturally I became intrigued and tracked some of their results.

I found four different published predictions:
  • Infostrada Sports: These guys used results from "Olympics, World Championships, and World Cups (or equivalent)" since the 2010 Vancouver Olympics to develop a likely scenario for who would win in each event. Their model had different weights for the results, time since the event, and nature of the event. They only ranked the top 15 countries on their medal table, and it was last updated three days before the opening ceremonies.
  • Wall Street Journal: The prestigious journal interviewed experts and rated recent performances, and assigned probabilities to certain outcomes. They claim to have been accurate to "within a few medals" in the last two Olympics, but were actually just alright for the 2012 London games, and only good at predicting a few countries in Vancouver in 2010.
  • SportsMyriad: I think this is a blog? Either way it's a fun website if you like sports stats. No real idea where the stats came from (apart from the disclaimer "It'll change from injuries, form, whims, etc.").
  • Andreff & Andreff (2014): A working paper from the International Association of Sports Economists, and also posted to the Freakonomics blog, the authors correlated factors such as population, per-capita income, political regime, average snowfall, and number of ski resorts to try to determine the number of medals. This sort of approach has been used for summer games before (probably not with ski resorts as a major factor...), but apparently not for winter Olympics. These were the only guys to include upper and lower bounds on their predictions.
How did they all turn out? Sort of alright, I guess. Sort of.

The best prediction was by the Wall Street Journal, with a coefficient of determination of 0.77 for total medals, and 0.63 for golds (1 being perfect).

Notable exceptions were the Netherlands (getting double the expected medals - whoops) and South Korea (getting half their prediction), but otherwise things were pretty decent for the Wall Street Journal.

Next best was the SportsMyriad site, which came in only slightly behind at 0.75 for total medals, but less close at 0.58 for golds.

Andreff and Andreff were next up, with a coefficient of determination of 0.68 for total medals (their model didn't break it down into colours). They were the only group to include upper and lower bounds, which proved to be a bit silly since only 35% of countries fell within the bounds given to them. These guys were the most wrong about the Netherlands (they predicted very confidently they'd get 5-7 medals, instead they got 24).

InfoStrada was the furthest off, with a coefficient of determination of 0.22 for total medal count. This is a bit unfair of a direct comparison, though, as they only listed their top 15 countries, and the addition of 10 lower-performing countries would have likely bumped that up. Even on a comparison of their top 15 across all models, though, they still came last.

In general, the Olympics are tough to predict, for loads of reasons. Even the best in a sport don't win every event they compete in, and trying to predict the result of a single mogul run or figure skate performance is an exercise in futility. Team sports predictions are rough since the full teams only rarely play each other with the exact line-ups between Olympics, and occasionally Olympic berths are won by teams or athletes who don't even end up competing. Using socio-economic data is probably ok for getting a general picture of a country's winter abilities, but ignores the fact that sometimes people are just good at something despite their surroundings.

That being said, I admire the effort by these would-be predictors, and look forward to seeing how they do next time around!

Wednesday, January 22, 2014

LRT Station Names are Silly

When City Council added "Fort Edmonton Park" to the South Campus LRT station name, many people were baffled.

The distance between Fort Edmonton Park and South Campus is about three kilometers. This may not seem too extreme, until you consider what else is within 3 km of South Campus station. For instance:

  • Four other LRT stations (Southgate up to University Stations)
  • Snow Valley
  • Half of Hawrelak Park
  • Most of the fun parts of Whyte Ave
  • Arch-rivals Harry Ainlay AND Strathcona High School
  • The Zoo

Everything within 3 km of South Campus

This is obviously ridiculous. The name of an LRT station should be based on what you would reasonably expect is by that LRT station, not something that you could connect to three kilometers away. If we opened the rest of the LRT station names up to standards like this, we'd have this chunk of the city with potential naming rights:

Which is getting right up to 20-25% of the city. Ludicrous.

However, this gave me an idea for some possible LRT station name changes, if we're allowed to be within 3 km of any of them. How would you like to go to the:

  • Century Park/IKEA Station
  • Southgate/Derrick Golf Course Station
  • South Campus/Snow Valley Station
  • McKernan/Belgravia/Hawrelak Park Station
  • Health Sciences/Jubilee/Oliver Village Station
  • University/Valley Zoo Station
  • Grandin/Campus St Jean Station
  • Corona/NAIT Station
  • Bay/Enterprise Square/Lister Residence Station
  • Central/Bonnie Doon Station
  • Churchill/Mill Creek Station
  • Stadium/Grant MacEwan Station
  • Coliseum/Kingsway Mall Station
  • Belvedere/Concordia Station
  • Clareview/Londonderry Mall Station
Any other fun ones I might have missed? Let me know!

Thursday, January 16, 2014

Just your friendly neighborhood stats!

The city of Edmonton is very statistician-friendly. As well as having the marvelous open-data catalogue, they also have detailed data sheets on every single one of Edmonton's neighborhoods

Instead of having a big preamble about how much I like Sim City, I figured I'd just jump right into what I ended up doing with Edmonton's stats. Each of the following maps is based on one of the stats collected, and show some pretty cool patterns. (Note: for all maps apart from road conditions, red indicates high values and blue indicates low values)

1. Household Income
 Income Map for Edmonton

Household income ranges from $22,000 to $146,000 average per neighborhood, and the highest income tends to be mostly focused in the southwest of the city. In the southwest it spans across both sides of the river fairly evenly, though the above-average wealth extends farther down into Riverbend and Ellerslie. On the northeast end of the city, though, you can almost trace the river by the precipitous drop in incomes from south to north. Average incomes don't pick up again until about 153rd ave.

2. Property Assessment Value

Property Value Map for Edmonton
There is definitely a correlation between household income and property value, but it isn't necessarily always present. This is particularly noticeable around downtown and just south of downtown, where lots of young professionals are making a good income but are renting. Average neighborhood property values range from $201,000 to $836,000.

3. Road and Sidewalk Conditions

Road Conditions Map for Edmonton
Admittedly, I have no idea what scale is used to measure road and sidewalk conditions. It seems to peak at around 20, and my guess is that higher numbers are good, but that's about as far as I can figure out. (note, here good roads are blue, bad roads are red).

The map for road conditions shows a non-surprising trend where the roads in the outside of the city (the newest roads) are in the best condition, and the roads on the inside are quite a bit worse. Every now and then, though, an inside neighborhood has particularly good road conditions, likely the result of recent work to fix a problematic area. Note: these values are from 2010 data, so if you feel your neighborhood isn't quite as shown in the picture, it's not my fault.

4. Hospitalizations

Hospitalization Rate Map for Edmonton
My best guess for the meaning behind these numbers is that the represent the number of hospitalizations per 1,000 people per year. In Edmonton, they range from 35 to 180.

Unlike the previous maps, the rate of hospitalizations in Edmonton isn't quite so territorial. Most of the city is sitting comfortable around the average, with a notable exception being immediately north-east of downtown (and one bad neighborhood in Mill Woods).

5. People Older than 20 without Grade 9 Education
Missing Grade 9 Education Map for Edmonton
This stat was actually alarming. Likely because I've spent a lot of time on university campuses recently, I've forgotten that some people don't graduate high school. In Edmonton, this value ranges from 4.3% to 41.3%.

This stat correlates very highly with property values and average household income, which makes sense in a way. It's very interesting to see it laid out like this on a map, and I'll leave you to your own conclusions about the ties between wealth and education are.

6. Unemployment
Unemployment Map for Edmonton

2010 Edmonton unemployment was actually fairly impressive. Neighborhood data ranged from 0% to 7.46%, but with an average of 2.69%.

Certainly a lot of the rich areas have tremendously low unemployment, but the rest of the city appears to vary from the average only in certain neighborhoods downtown, with poorer neighborhoods alternating sometimes from high to low unemployment over a distance of only a couple blocks.

I purposefully didn't include statistics that weren't averages or normalized, because though comparing the number of violent crimes between neighborhoods would also have made for a cool map, the differences in size and population would have made straight comparison a bit more difficult. I also excluded rent costs because those were based on 2006 census data, and they were so low compared to today they almost made me cry.

Neighborhood Awards!
Best place to live: Donsdale
Worst place to live: McCauley
The average Edmonton experience: Keheewin
Silliest name: Gariepy (Gary-Epi? Gary-Pee? Silly Gary...)
Least original name: Anything with "West" in it.

Friday, January 10, 2014

Fun with Evolution

Darwin's theory of evolution suggests that random variation can lead to non-random change in species, where individual organisms that are marginally better suited to their environment have a better chance of surviving and passing their winning genes on to the next generation. From this extremely simple idea, repeated countless times, we get the massive diversity and complexity of life.

While it's pretty cool and poetic and stuff, what's particularly fascinating is how it's been adopted into computing science. An entire branch of problem-solving algorithms exist that attempt to recreate evolution, an when reproduced on a large scale can actually be effective at solving incredibly complex problems.

Evolutionary algorithms are particularly well suited to problems where the connections between variables aren't well understood, and computing shortcuts can't be taken, but where programmers know in general what they're looking for. They are also prone to several issues: they're slow, and they tend to latch onto solutions that are pretty good, but not the best (also known as local maximums).

In the simplest terms, computer scientists will create a "gene pool" full of randomly-determined organisms, with the random genes corresponding to variables to be optimized. If the problem is well-defined, each organism can be evaluated and ranked based on how good of a solution they are, and then (similar to real life), the fittest will get together and have baby organisms, with genes from both parents. The process can be repeated as long as desired until a suitable solution is found.

Like real evolution, the non-random selection of the fittest, with random variation in their genes and starting condition, is expected to eventually produce organisms that are well-suited to their environment, but in this case being well-suited means they are an optimized solution to a problem.

In the real world, designers have used evolutionary algorithms to optimize extremely complex designs like car engines or wind turbine blades, but for fun I decided to do a fairly simple test on Excel as a proof of concept to myself. Instead of doing anything useful or fancy, I decided to try to get Excel to draw for me. Specifically, I wanted it to say "HI" to me (hey, I yell at it enough, figured it deserved a chance to yell back).

In order to do this, I politely asked Excel to create 100 random organisms, with each 'organism' defined as a 24-'gene' series of random values. This resulted in 100 random assortments of 6 rectangles. These 'organisms' could look something like this:

And I wanted to give them the goal of overlapping to trace this:

In order to do that, each of my original 100 'organisms' were given a fitness value based on whether they covered the target areas, but lost points based on how much they covered non-target areas. Then they all got a chance to breed like rabbits, but the fittest ones (with the highest scores) had a better chance of breeding than the others.

The actual breeding process is indistinguishable from zookeepers breeding uncooperative endangered animals - I stuck two randomly (but not equitably) chosen organisms in a room and made them watch videos of other bits of data 'doing it' until something popped out. In Excel terms, though, each of the 24 genes were examined in turn, and each child gene had an even chance of coming from either parent. In order to freshen up the gene pool, each gene also had a 5% chance of mutating.

Breeding continued until I got 100 new little babies, at which point I killed off the old stock and started over. In each generation, the fittest have a disproportionate chance of passing on their genes to the next generation, with the idea being that hopefully each generation is stronger than the previous until the goal is met.

These are the results of the first test, after 1,000 generations:

To use technical terminology, this is extremely unimpressive. Two of the rectangles form the I on the right, the lower right leg of the H is there, but a bunch of dead space in the H is taken up on the top, and one rectangle has wandered off completely (there were supposed to be 6...). Don't even get me started on the green rectangle, I think he's shy.

What was encouraging about this, though, was that the average scores for the population did tend to grow each generation, though they ended up plateau-ing after about 500 generations:

This is what I meant before by 'local maximum' solutions. In order for the score to improve, the purple rectangle would have to change by an amount that's more than what I've allowed mutations alone to cover. Also, during the required changes, it's likely that the scores would decrease (as the purple rectangle abandons those upper H legs), which is resisted from an evolutionary point of view. This actually parallels evolution quite well in that once animals are adapted to a situation they stop changing rapidly (even though they may not be perfectly optimized), until external pressures force them to need to adapt again.

The best way to improve solutions from evolutionary algorithms is to increase the sample size (get more genes in there!) and number of generations, but since those both involved forcing my computer to make funny noises, I decided instead to try a brand new set of 100 random organisms. After 1000 generations, I got:

That's... sort of better, actually. At least all six rectangles made it onto the screen, and all three vertical lines are definitely there. Again, though, in order to get this one to perfectly spell the word, the little purple rectangle would have required a tremendously lucky mutation, which was discouraging enough that instead I decided to try one last new group of 100. Here's their final result:

Actually, that's not bad. I have no idea what the little purple guy (why's it always purple?) is doing, but he's pretty much out of the way, and all the major parts of the word "HI" are definitely covered. Not bad for random number on Excel, eh?

In case you really wanted to play around with this spreadsheet, I've put a version of it here. It's already been seeded with random numbers, but if you open it as a macro-enabled spreadsheet, every time you hit "Crtl+q", a new generation will form (Crtl+w for 10 new generations, Crtl+e for 100, but that'll take a bit of time...). Enjoy!

Wednesday, December 18, 2013

Why I Love the Henday

When I was choosing where to get my apartment, my major concerns were cost, decent neighborhood, and access to the LRT for work. That was pretty much it, and as a result I ended up in (what I consider to be) a pretty great location down by the Century Park LRT station.

I soon realized that, while LRT access was great for getting downtown and to sports games, being as far south as I was ended up being fairly inconvenient for getting around to the rest of town, and I was using the Henday ring road a lot more than I had been at my old home, even just to get to other places within Edmonton.

So I decided to take a look at just how efficient the road system is in Edmonton, and how much the Henday played a role in my life. First of all, here's a map showing travel times for someone who lives downtown:

Living smack-dab in the middle of the city definitely has its advantages in terms of minimizing driving times (note: this is assuming no traffic, which is fairly unreasonable for a lot of the time downtown...). Pretty much anywhere between the Whitemud and the Yellowhead is accessible within 15 minutes by car, and, 54% of the city's area is accessible within 20 minutes. Sherwood Park freeway and the Whitemud really open the city out to the east, too.

On the other hand, here is what a similar map looks like for me:

Though it's still a comparable net transit time (53% of the city area is still accessible within 20 minutes), the covered area is very different. This is hardly surprising, of course - sticking someone out at the end of a city ought to increase travel times. What's really cool, though, is that it takes less time to get to the exact opposite side of town than it does to get downtown, even though it's twice as far away. You can even see the effect of the Henday around St. Albert, where a thin band of green colouring hugs the highway.

The real benefit of the Henday is revealed when I plot the same map, but instead avoiding the use of the Henday if at all possible:

Yikes. Pretty much the only easily-traveled areas of the city are anything south of or connected to the Whitemud. Now only 39% of the city can be accessed within 20 minutes, with some areas taking up to 45, and the Cameron Heights neighborhood is pretty much completely lost to me, even though it's fairly close (as the Henday was the closest bridge to it).

Taking the Henday can reduce travel times for me by up to 35%. That's why I love the Henday.