Wednesday, January 22, 2014

LRT Station Names are Silly

When City Council added "Fort Edmonton Park" to the South Campus LRT station name, many people were baffled.

The distance between Fort Edmonton Park and South Campus is about three kilometers. This may not seem too extreme, until you consider what else is within 3 km of South Campus station. For instance:

  • Four other LRT stations (Southgate up to University Stations)
  • Snow Valley
  • Half of Hawrelak Park
  • Most of the fun parts of Whyte Ave
  • Arch-rivals Harry Ainlay AND Strathcona High School
  • The Zoo

Everything within 3 km of South Campus

This is obviously ridiculous. The name of an LRT station should be based on what you would reasonably expect is by that LRT station, not something that you could connect to three kilometers away. If we opened the rest of the LRT station names up to standards like this, we'd have this chunk of the city with potential naming rights:

Which is getting right up to 20-25% of the city. Ludicrous.

However, this gave me an idea for some possible LRT station name changes, if we're allowed to be within 3 km of any of them. How would you like to go to the:

  • Century Park/IKEA Station
  • Southgate/Derrick Golf Course Station
  • South Campus/Snow Valley Station
  • McKernan/Belgravia/Hawrelak Park Station
  • Health Sciences/Jubilee/Oliver Village Station
  • University/Valley Zoo Station
  • Grandin/Campus St Jean Station
  • Corona/NAIT Station
  • Bay/Enterprise Square/Lister Residence Station
  • Central/Bonnie Doon Station
  • Churchill/Mill Creek Station
  • Stadium/Grant MacEwan Station
  • Coliseum/Kingsway Mall Station
  • Belvedere/Concordia Station
  • Clareview/Londonderry Mall Station
Any other fun ones I might have missed? Let me know!

Thursday, January 16, 2014

Just your friendly neighborhood stats!

The city of Edmonton is very statistician-friendly. As well as having the marvelous open-data catalogue, they also have detailed data sheets on every single one of Edmonton's neighborhoods

Instead of having a big preamble about how much I like Sim City, I figured I'd just jump right into what I ended up doing with Edmonton's stats. Each of the following maps is based on one of the stats collected, and show some pretty cool patterns. (Note: for all maps apart from road conditions, red indicates high values and blue indicates low values)

1. Household Income
 Income Map for Edmonton

Household income ranges from $22,000 to $146,000 average per neighborhood, and the highest income tends to be mostly focused in the southwest of the city. In the southwest it spans across both sides of the river fairly evenly, though the above-average wealth extends farther down into Riverbend and Ellerslie. On the northeast end of the city, though, you can almost trace the river by the precipitous drop in incomes from south to north. Average incomes don't pick up again until about 153rd ave.

2. Property Assessment Value

Property Value Map for Edmonton
There is definitely a correlation between household income and property value, but it isn't necessarily always present. This is particularly noticeable around downtown and just south of downtown, where lots of young professionals are making a good income but are renting. Average neighborhood property values range from $201,000 to $836,000.

3. Road and Sidewalk Conditions

Road Conditions Map for Edmonton
Admittedly, I have no idea what scale is used to measure road and sidewalk conditions. It seems to peak at around 20, and my guess is that higher numbers are good, but that's about as far as I can figure out. (note, here good roads are blue, bad roads are red).

The map for road conditions shows a non-surprising trend where the roads in the outside of the city (the newest roads) are in the best condition, and the roads on the inside are quite a bit worse. Every now and then, though, an inside neighborhood has particularly good road conditions, likely the result of recent work to fix a problematic area. Note: these values are from 2010 data, so if you feel your neighborhood isn't quite as shown in the picture, it's not my fault.

4. Hospitalizations

Hospitalization Rate Map for Edmonton
My best guess for the meaning behind these numbers is that the represent the number of hospitalizations per 1,000 people per year. In Edmonton, they range from 35 to 180.

Unlike the previous maps, the rate of hospitalizations in Edmonton isn't quite so territorial. Most of the city is sitting comfortable around the average, with a notable exception being immediately north-east of downtown (and one bad neighborhood in Mill Woods).

5. People Older than 20 without Grade 9 Education
Missing Grade 9 Education Map for Edmonton
This stat was actually alarming. Likely because I've spent a lot of time on university campuses recently, I've forgotten that some people don't graduate high school. In Edmonton, this value ranges from 4.3% to 41.3%.

This stat correlates very highly with property values and average household income, which makes sense in a way. It's very interesting to see it laid out like this on a map, and I'll leave you to your own conclusions about the ties between wealth and education are.

6. Unemployment
Unemployment Map for Edmonton

2010 Edmonton unemployment was actually fairly impressive. Neighborhood data ranged from 0% to 7.46%, but with an average of 2.69%.

Certainly a lot of the rich areas have tremendously low unemployment, but the rest of the city appears to vary from the average only in certain neighborhoods downtown, with poorer neighborhoods alternating sometimes from high to low unemployment over a distance of only a couple blocks.

I purposefully didn't include statistics that weren't averages or normalized, because though comparing the number of violent crimes between neighborhoods would also have made for a cool map, the differences in size and population would have made straight comparison a bit more difficult. I also excluded rent costs because those were based on 2006 census data, and they were so low compared to today they almost made me cry.

Neighborhood Awards!
Best place to live: Donsdale
Worst place to live: McCauley
The average Edmonton experience: Keheewin
Silliest name: Gariepy (Gary-Epi? Gary-Pee? Silly Gary...)
Least original name: Anything with "West" in it.

Friday, January 10, 2014

Fun with Evolution

Darwin's theory of evolution suggests that random variation can lead to non-random change in species, where individual organisms that are marginally better suited to their environment have a better chance of surviving and passing their winning genes on to the next generation. From this extremely simple idea, repeated countless times, we get the massive diversity and complexity of life.

While it's pretty cool and poetic and stuff, what's particularly fascinating is how it's been adopted into computing science. An entire branch of problem-solving algorithms exist that attempt to recreate evolution, an when reproduced on a large scale can actually be effective at solving incredibly complex problems.

Evolutionary algorithms are particularly well suited to problems where the connections between variables aren't well understood, and computing shortcuts can't be taken, but where programmers know in general what they're looking for. They are also prone to several issues: they're slow, and they tend to latch onto solutions that are pretty good, but not the best (also known as local maximums).

In the simplest terms, computer scientists will create a "gene pool" full of randomly-determined organisms, with the random genes corresponding to variables to be optimized. If the problem is well-defined, each organism can be evaluated and ranked based on how good of a solution they are, and then (similar to real life), the fittest will get together and have baby organisms, with genes from both parents. The process can be repeated as long as desired until a suitable solution is found.

Like real evolution, the non-random selection of the fittest, with random variation in their genes and starting condition, is expected to eventually produce organisms that are well-suited to their environment, but in this case being well-suited means they are an optimized solution to a problem.

In the real world, designers have used evolutionary algorithms to optimize extremely complex designs like car engines or wind turbine blades, but for fun I decided to do a fairly simple test on Excel as a proof of concept to myself. Instead of doing anything useful or fancy, I decided to try to get Excel to draw for me. Specifically, I wanted it to say "HI" to me (hey, I yell at it enough, figured it deserved a chance to yell back).

In order to do this, I politely asked Excel to create 100 random organisms, with each 'organism' defined as a 24-'gene' series of random values. This resulted in 100 random assortments of 6 rectangles. These 'organisms' could look something like this:

And I wanted to give them the goal of overlapping to trace this:

In order to do that, each of my original 100 'organisms' were given a fitness value based on whether they covered the target areas, but lost points based on how much they covered non-target areas. Then they all got a chance to breed like rabbits, but the fittest ones (with the highest scores) had a better chance of breeding than the others.

The actual breeding process is indistinguishable from zookeepers breeding uncooperative endangered animals - I stuck two randomly (but not equitably) chosen organisms in a room and made them watch videos of other bits of data 'doing it' until something popped out. In Excel terms, though, each of the 24 genes were examined in turn, and each child gene had an even chance of coming from either parent. In order to freshen up the gene pool, each gene also had a 5% chance of mutating.

Breeding continued until I got 100 new little babies, at which point I killed off the old stock and started over. In each generation, the fittest have a disproportionate chance of passing on their genes to the next generation, with the idea being that hopefully each generation is stronger than the previous until the goal is met.

These are the results of the first test, after 1,000 generations:

To use technical terminology, this is extremely unimpressive. Two of the rectangles form the I on the right, the lower right leg of the H is there, but a bunch of dead space in the H is taken up on the top, and one rectangle has wandered off completely (there were supposed to be 6...). Don't even get me started on the green rectangle, I think he's shy.

What was encouraging about this, though, was that the average scores for the population did tend to grow each generation, though they ended up plateau-ing after about 500 generations:

This is what I meant before by 'local maximum' solutions. In order for the score to improve, the purple rectangle would have to change by an amount that's more than what I've allowed mutations alone to cover. Also, during the required changes, it's likely that the scores would decrease (as the purple rectangle abandons those upper H legs), which is resisted from an evolutionary point of view. This actually parallels evolution quite well in that once animals are adapted to a situation they stop changing rapidly (even though they may not be perfectly optimized), until external pressures force them to need to adapt again.

The best way to improve solutions from evolutionary algorithms is to increase the sample size (get more genes in there!) and number of generations, but since those both involved forcing my computer to make funny noises, I decided instead to try a brand new set of 100 random organisms. After 1000 generations, I got:

That's... sort of better, actually. At least all six rectangles made it onto the screen, and all three vertical lines are definitely there. Again, though, in order to get this one to perfectly spell the word, the little purple rectangle would have required a tremendously lucky mutation, which was discouraging enough that instead I decided to try one last new group of 100. Here's their final result:

Actually, that's not bad. I have no idea what the little purple guy (why's it always purple?) is doing, but he's pretty much out of the way, and all the major parts of the word "HI" are definitely covered. Not bad for random number on Excel, eh?

In case you really wanted to play around with this spreadsheet, I've put a version of it here. It's already been seeded with random numbers, but if you open it as a macro-enabled spreadsheet, every time you hit "Crtl+q", a new generation will form (Crtl+w for 10 new generations, Crtl+e for 100, but that'll take a bit of time...). Enjoy!