Tuesday, April 21, 2015

Election Tours of Alberta

As of today, we are officially halfway through the 2015 Alberta provincial election! Has it been as much fun for you as it has been for me yet?

At the same time as we pick our Alberta government, over in the United Kingdom voters there are also going through a general election. This has gotten a fair bit more attention than the Alberta election, including a fun post from FiveThirtyEight that determines the best campaign route for the leaders of various parties in that election.

It was such a fun post, in fact, that I figured I'd try to do something similar for Alberta!

As the second half of the provincial election unfolds in Alberta, the various party leaders are going to be scrambling to get to as many events as they can in what they consider to be key constituencies across the province. But what's the best route they can take through the most key constituencies, in order to minimize driving time and get the most bang for their buck?

This is a version of the well-known travelling salesman problem, which I dealt with once before for developing an Edmonton pub crawl. For my last travelling salesman problem, I only looked at a maximum of 10 locations, which allows for a time-consuming but 100% accurate solution. With 10 locations, there are 3,628,800 possible routes between each location to be examined.

I've decided to up the ante this time, and look at 20 constituencies for each of the four parties who elected MLAs in the last election. 20 locations each means a total of 2,432,902,008,176,640,000 distance combinations would need to be checked to ensure the absolute shortest route between them all. That's not a task I'm willing to participate in...

Instead, I've fiddled around with a fun trick called simulated annealing. Essentially, you start with a random travel plan, and each iteration you compare it to a proposed one that's slightly different. If the new one is better, you swap it for the old one, but if the new one is worse then you have a probability of swapping. The probability depends on the annealing 'temperature', which decreases as you iterate the procedure.

The advantage of simulated annealing is that by occasionally allowing worse solutions, you give the system the ability to work itself out of locally optimized solutions it may have found, in order to hopefully end up finding the actual best solution.

But enough math - back to politics. For each of the four parties I looked at, I tried to find the shortest route through the 20 constituencies that each party came closest to winning or losing during the 2012 election campaign. This way, leaders were hopefully going to a mix of constituencies where they barely won and barely lost, where conceivably the appearance of the party's leader over the next two weeks might make the most difference.

Let's start with the NDP:

The closest races for the NDP were, maybe not surprisingly, mostly in Edmonton. Rachel Notley's trip would start up in Edmonton-Manning, run through 13 of Edmonton's closest races, and continue on south to Lethbridge with the occasional stop in Red Deer and Calgary. A pretty easy urban whirlwind tour for her, really.

Total time: 7 hours, 57 minutes.

Next, the Liberals:

David Swann's journey isn't altogether too different than Rachel Notley's, though it's a more even split between Calgary and Edmonton, with less in between. Unlike the NDP, there weren't any races that the Liberals won by enough of a margin that desperate help wasn't needed. The quick trip out to Canmore to deal with Banff-Cochrane ought to make for some great sight-seeing!

Total time: 9 hours, 59 minutes.

The Wildrose:

Brian Jean is in for quite a different ride. Starting up in Dunvegan-Central Peace-Notley, he only barely glances at Edmonton on his way down to the juicy urban Calgary core the Wildrose stands to gain. Then it's off to Medicine Hat before winding his way back north to Fort MacMurray.

Total time: 1 day, 1 hour, 32 minutes.

Lastly, the PCs:

The PC map looks quite similar to the Wildrose, largely because the closest contests in rural Alberta were directly between the two parties. Major differences include the four stops in north Edmonton, and the changed focus on south Calgary.

Total time: 1 day, 1 hour, 57 minutes.

So there you go! If you happen to see the campaign busses on the highway and they're not going the right way, make sure to let them know. Best of luck to all in the last two weeks of the election!

Monday, April 20, 2015

Edmonton's NHL Draft Lottery Luck

This weekend, the hockey world lit up with the news that, for the fourth time in six years, Edmonton got the first overall pick in the NHL draft lottery. This year was extra special, as the projected first-round pick Connor McDavid is supposed to be the chosen one who will lead us from our years of darkness (...or something).

The question I was faced with is just how unlikely is it that Edmonton came first 4 times in the last six years. After all, it is a lottery. The fact that the team who gets the first overall draft pick is randomly determined each year is good because it hopefully reduces the chances of a team tanking on purpose to be the worst team in the league in a given year, and keeps games interesting for fans.

Over the last six years a few different odds distributions were offered for the 14 lottery teams that didn't make the playoffs. Until 2012 only the five worst teams had a chance of getting the first draft pick (the absolute worst team had a 48.2% chance), but since 2013 all 14 of the worst teams have some chance or another.

Edmonton and Carolina were the only two teams to not make it in the playoffs over all six of those years, so it stands to reason that they had the best shots at getting the first draft picks at least once or twice in that period, right? This is what happens if you actually crunch the numbers though:

It turns out Edmonton's chances of getting four first-round picks over the last six years was actually around 1.9%. This is certainly low, but not necessarily anything impossible.

There are two reasons that this may be higher than you'd think. First of all, I was looking at the chances of Edmonton winning any four of the last six drafts, not specifically the first three, losing two, and then winning the sixth. Those odds are astronomically low, but deceptive since nobody is up in arms about the lotteries Edmonton didn't win. Secondly, Edmonton's chances were so much higher than Carolina because the first three years Carolina was in the draft lottery, they weren't in a position where they could have won first pick overall (as before 2013, a team winning the lottery could only move up a maximum of 4 positions).

All told, this gives Edmonton an expected return of 1.456 overall first picks over the last 6 years, where they actually got 4. To put that in perspective, they were expected to get almost twice as many overall first picks as the next worst team over the last six years. Of the 27 teams who made at least one appearance in the draft lottery over the last six years, we have:

Realistically, this means that the luckiest teams in the draft have been Edmonton, Florida, and Colorado, and the unluckiest has probably been Columbus. The four teams at the bottom happened to have their bad seasons in years where they weren't quite bad enough to have a shot at first overall pick (poor guys).

So yes, Edmonton has gotten lucky with draft picks over the last six years, but it's not quite as impossible as it would have otherwise seemed. We were helped out by being the worst team in the league in two years where we had nearly 50% chances of winning the draft, and by generally being terrible in the rest of the years to keep our chances high. We've gotten lucky at the draft, but only by being genuinely terrible over the last six years, and I sincerely hope that trend starts to reverse soon.

Tuesday, April 14, 2015

Edmonton Air Quality

This morning, I read an Edmonton Journal article that claimed that Edmonton's air quality was worse than Toronto's, even though we have five times less population than Toronto. The article's subtitle reads: "Particulate readings 25 per cent higher on some winter days."

I'll admit that my initial reaction to this was skepticism - the language used in the article seemed pretty wishy-washy and I wasn't sure what all the fuss was about. It's not terribly unnatural for some days in some cities to be worse than some days in other cities. Also, if pollution levels are particularly low on certain days, being 25% higher than another city is pretty easy and still reasonably healthy. So I decided to look into the numbers a little bit more.

The article continues, saying "pollution from particulate matter exceeded legal limits of 30 micrograms per cubic metre at two city monitoring stations on several winter days in 2010 through 2012." Ok, that sounds pretty bad, but what do these limits correspond to, and how bad is "several", really?

First of all, let's take a look at what makes air unhealthy. The Air Quality Health Index used by Environment Canada looks at three factors: Ozone at ground level, Particulate Matter (PM2.5/PM10), and Nitrogen Dioxide. Exposure to Ozone is linked to asthma, bronchitis, heart attack, and death (fun), nitrogen dioxide is pretty toxic, and particulate matter less than 2.5 microns is small enough to pass right through your lungs and play with some of your other organs. These aren't things you really want to be breathing a whole lot of. The AQHI for Edmonton today is a 3 out of 10, considered ideal for outdoor activities, but at a 10 out of 10 level people are pretty much encouraged to stay inside and play board games.

The report in the Journal article referenced PM2.5 only, which is particulate matter that's smaller than 2.5 microns. The maximum allowed levels for PM2.5 in Alberta are 80 micrograms per cubic meter (ug/m3) in a single hour, or 30 ug/m3 over a day. According to the Journal article, these levels were exceeded "several" times between 2010 and 2012. How many is several?

Data from the Clean Air Strategic Alliance

I don't know about you, but exceeding government safe levels for air quality on one day out of every eleven in 2010 is not what I'd call "several." There was over a combined month of air quality limits being broken in 2010 in central and east Edmonton.

I strongly disliked the phrase "25 percent higher on some winter days" due to its vagueness, but the idea of comparing Edmonton to Toronto seemed fun. Based on the CASA values for Edmonton, and the Air Quality Ontario values for Toronto, here's a comparison of the two cities from 2006-2012:

That's... not even close. Edmonton was 25% higher than Toronto for pretty much all of 2012, not "some winter days." This is enough to make me feel like perhaps the sources referenced in the Journal were using different data, or perhaps I'm mistaken, but the sources I used are all publicly available and I encourage you to check them out yourself.

But what about the other major air quality indicators? Turns out that, fortunately, exceeding their limits has proven to be much tougher. The maximum one-hour limit for nitrogen dioxide is 0.159 ppm, over 10 times the daily average for both Toronto and Edmonton recently:

Similarly, the one-hour limit for Ozone is 0.082 ppm, about four times the recent daily averages:

Again, these levels are much safer than the particulate levels were, and in general Edmonton is about the same or slightly better than Toronto for these indicators.

So all in all, I started out today thinking the article was being alarmist, if vague, and I've ended up thinking that it's well-meaning but presented oddly. Edmonton definitely does seem to have a problem with one of the major indicators of air quality, and if it takes a city-pride fight with Toronto to get that sorted out, so be it.

Monday, March 9, 2015

The 2015 Brier Playoffs were the Most Exciting in Years

Congratulations to Team Canada on winning back-to-back Brier tournaments! With that, the Canadian national tournaments are over for the season, and two teams are off to represent us at the Worlds!

Now, a lot of people don't find curling to be all that exciting. A lot of that comes from individual taste - some people haven't played curling or aren't from rural Saskatchewan, and without having been there before or having a vested interest in the teams, maybe some of the intricacies aren't all understood.

Either way, for the more ardent curling fans, the game is often very exciting. But some games are undeniably more exciting than others, and apart from the final score is there a numerical way to determine which games are the most riveting?

Sure there is! It's called the Excitement Index.

I first came across the Excitement Index (EI) in the context of the National Football League, where the owners of 'Advanced Football Analytics' have developed a model that predicts each team's chances of winning a football game after each play. They decided to develop an EI based on the absolute sum of chances in winning percentage throughout the game. A game where team storms to an early lead and holds it will have a much lower EI than one that's back and forth all game, and similarly a modestly successful play early in the game will likely have less effect on EI than one that clinches the game in the last minutes.

So I came up with a similar system for curling. Instead of looking at it play-by-play, I came up with a model that predicts winning percentages based on the score after each end. For instance, I now know that over the last 15 years of Brier tournaments, a team that's been up by 1 after 4 ends with the hammer has won 80% of the time.

Going through games end-by-end, using this model, can develop the total EI for that game. For instance, a particularly unexciting game could look like this: (1/2 page playoff game, 2014 Brier):

Last year, British Columbia got 2 with the hammer right off the bat, then stole one, leaving them up 3 without hammer after two - a very strong opening that historically results in a win 91% of the time. Alberta then only got one with the hammer, but in doing so gave the hammer back to BC for only a one-point trade, resulting in no real change in probabilities. BC then got 3 with the hammer, and the game was essentially wrapped up by that point (6-1 after 4 is virtually insurmountable for teams at the Brier). The sum of changes in probability in this game was only 0.27 after the first end - not a terribly exciting game.

On the other hand, here's the most exciting Brier playoff game I've analyzed: (2004 Final)

Nova Scotia started off not too well, only getting one with the hammer. The teams then traded 2-point ends, which resulted in a lot of back-and-forth in terms of probabilities while slowly inching in favour of Alberta. NS only getting one in the 5th was bad, but not quite as bad as AB getting three in the sixth. At this point it was 8-4 after 7, and being down by 4 after 7 with the hammer only has a win rate of 1.6%. The rest of the game was improbable, to say the least, and made for a very exciting finish. The total EI was 1.87, about 6 times 'more' exciting than the previous game.

It's important to note that a mid-game comeback isn't necessary to have a high EI value. The 'most exciting' game I could find while building my model was in Draw 6 of the 2009 Scotties (note: I have a separate model for men's and women's curling):

With all of the high-scoring ends at the beginning, and winning with a steal at the end, the back-and-forth swing throughout this game led it to a massive EI of 2.76.

All this leads to the fun finding that the 2015 Brier playoffs were the most exciting in recent history! Averaging the EI values for all games in the playoffs gives an average EI of 1.41 for 2015, with the final game (1.19) being the third most exciting final analyzed.

Stay tuned for next week, when I use the historical model I have to look at when it's better to take a point as opposed to blank an end, and compare men's curling to women's curling in a little more depth!

Friday, February 13, 2015

Voting Districts and Gerrymandering

You've maybe heard about gerrymandering, and if you have you've most likely heard of it with somewhat of a negative connotation. That's with good reason - gerrymandering is really bad.

For those of you unfamiliar with it, here's a brief example. Let's say you're the Blue Yuppies, a political party that is really popular with the hipster downtown folk, but not so much with suburban soccer moms. You won the last election in a city with four seats, but have lost popularity recently, and your polling tells you that neighborhoods in your city would vote like this in an upcoming election:

Super idealized city
Oh no! Only 28 out of 64 neighborhoods would vote for you! How can you stay in power in the upcoming election? If this city has to have four seats because of its population, and you're the party in power, maybe you could fiddle with the seat boundaries a bit to maximize your chances? Here are some options:

Intriguing. In all four options, the city is divided evenly into 4 districts. The top left option divides seats more or less into downtown vs suburbia and results in a 50-50 split, which is pretty good considering the Blue Yuppies don't even have 50% of the popular vote. The top right and bottom left options result in losses. Crazy enough, though, with just a bit of cleverness, the Blue Yuppies can sacrifice one district to their opponents, and just squeak by with three out of four of the seats - a solid majority - even though they have less than half the popular vote. Crazy!

This is what gerrymandering is. Redistricting is a huge power to have over your opponents between elections, because it can be leveraged quite well to retain power even against the popular vote of an area. And it's not always necessarily about giving yourself the comfortable easy seats - as in the bottom right example above, lumping all your opponent's support together can be a very effective way to secure the rest for yourself.

This often results in very funny looking districts. In the first three examples above, you could maybe justify most of the districts as effectively representing a region of the city. The gerrymandered one, though, has long snaky districts, in order to get the right sorts of people voting together.

Gerrymandering is alleged to happen in the USA a lot. Take a look at some of these examples and decide for yourself:

Capture ALL the cities!
And my personal favorite:
Because you know that highway is an important part of the district...
After too many shenanigans like this going on, some states are starting to insist on bipartisan committees to come up with new district boundaries, hoping to take the advantage away from the parties in power. The problem with this, though, is that the only thing the committee members from different parties tend to agree on is that they don't want to lose the next elections, and they may end up carving out districts that serve mostly to maintain their seats, leading to safe and boring elections. Yikes. 

Canada, on the other hand, tends to have completely independent commissions in charge of redistricting, with the goal of avoiding gerrymandering. But how well does our system actually work?

One way to check for potential gerrymandering is to check a district for its compactness. The underlying assumption here is that no point in a riding should be too far away from the middle, and that the riding should represent a reasonably consistent geographical area. Shapes like circles or squares are the most compact, shapes like Congressional District 4 up there aren't.

This fun report took a very in-depth look at compactness in voting districts in both the US and Canada. They measured compactness by comparing the perimeter of a district with the square root of the area, and then factored it so that a perfect square has a compactness value of exactly 1.

So how did Canada compare to the US? The average compactness values for 2006 were:

  • Canada: 1.400
  • USA: 2.103
From the report, the historical comparisons look like this:

Hill, 2009
For reference, if the average national districts were rectangles with the same areas, this is how they'd look:

So on that measure, Canada's pretty much always been better than the USA in terms of compactness (and, implied, in terms of likely gerrymandering), and the USA has seen a marked increase since about the early 1990s. 

But wait! Canada just did a full round of redistricting! We now have 338 ridings instead of 308. These are exciting times! Just how well did the commission do?

My analysis of the 338 districts using the same methodology as Hill (2009) shows an average compactness of: 1.408. In other words, nothing too much to worry about form a gerrymandering point of view! Hooray Canada for mostly transparent electoral systems!

The averages per province ranged from 1.27 for PEI to 1.59 for New Brunswick, and only six federal ridings had a compactness score worse than the average American district.

Saskatchewan was solid (average: 1.31)
Newfoundland and Labrador averaged 1.52, because shores are tricky
The great thing is that Canadian boundaries are still good, even at the provincial legislature level. Of Alberta's 87 provincial electoral districts, the average compactness is still only 1.440, very similar to the federal average and still way lower than the American average. In fact, only one Albertan district is worse than the average American district. Take a look at this super cool interactive thingy I made:

With the exception of Chestermere-Rocky View (which looks like it's giving Calgary a hug), all of these ridings are pretty reasonable, and they're all better than the average American voting district.

Go Canada! Our politics may not always be pretty, but at least there's no sign of anyone actively tinkering with them!

Tuesday, January 27, 2015

Pro Sport Team Mobility

Oh, the Oilers...

As of today, with a little less than half of the 2014-2015 NHL season remaining, the estimated odds of the Oilers getting into the playoffs are approximately 0.008%. They're currently last in their division, and the estimated chance of them staying there is 70.7%. Congratulations on another wonderful season!

The only thing the Oilers have going for them is that the NHL draft system gives a bonus to teams that do badly, with the hope of eventually balancing things out. Of course, the Oilers have supposedly been on the receiving end of this for a few years without success, but maybe this time it'll actually work?

But exactly how well does does the draft system work for helping the worst teams out in future seasons? I decided to find out.

The NHL has only had 30 teams playing since the 2000-2001 season, so I decided to stick with the years since then. I started by looking at the teams that ended up in either the top or bottom quintile (20% of teams), and tracking how likely they were to make it into either category over the following five years:

For example, the above graph shows that a team that finishes in the top 20% of the NHL one year has a ~50% chance to make it back in the top 20% within one year, and an 80% chance to make it within 5 years. On the other hand, the worst teams in the league only have a 7% chance to make it to the top within one year, and only a 50% chance to make it within 5 years. Cool, right?

Reversing the situation looks like this:

This results in somewhat of a different trend. Teams that do poorly have a 2/3 chance of being in the bottom fifth of the NHL again within 3 years, but after that it plateaus and there doesn't seem to be much increased risk of them doing terribly. Also, it's about half as likely for a great team to end up doing terribly at any point within the subsequent 5 seasons as it is for a poor team to do awesomely.

These results lead into a discussion of how closely correlated a team's performance is year-to-year. Plotting how successful they are one year against the subsequent year looks like this:

Definitely a correlation, but nothing worth placing a bet on. This says that, in general, the teams that did well one year are likely to do well the next. No huge surprise there. Looking further down the road:

Five years later, and there's almost no correlation at all. This is the sort of thing that ought to make Oilers fans happy, if only it weren't for the fact that they've been in that bottom 20% for four of the last five seasons. Ugh.

The change in correlation between seasons is shown pretty well by this:

Which is an excessively pretty and smooth trend between seasons. This apparent regression to the mean for teams on the whole, though, also applies really well to the teams on the extremes (the best and worst of any season):

Suggesting that, surprisingly, the best and worst of the league will be statistically equivalent after only 3-4 seasons.

So does this give us much of a prediction for when the Oilers will finally start showing up to play real hockey? Not really, though historically the trend has been that they ought to have a 50% chance of making it to the top of the league within 5 years, and that they'll be about the same as today's best in the league (eg Anaheim?) within about 3 years.

The final question then becomes just how much of this effect is caused by normal luck and regression to the mean, and how much is influenced by player trades and the draft system? Turns out that it's likely quite a bit - comparing four major pro sports leagues on of the charts above gives the following:

(Chart was made using 13 NHL seasons, 12 NFL seasons, 10 NBA, and 15 MLB for consistent league sizes)

The fact that there's such a variation between sports suggests that there's more to inter-season variation than just random chance, which is certainly promising news for all those coaches and general managers out there. It isn't terribly surprising that the correlation values for NFL are consistently low, since the number of games each team plays is drastically lower than the other three pro sports leagues. On the other hand, the results for the NBA are rather surprising - teams tend to do almost the exact same one year as they do on the next, but tend to have an inverse correlation five years down the road. If the Oilers had been an NBA team, they wouldn't expect to stay at the bottom for very long.

So while this year looks like another dud for the Oilers, there's always hope. Plenty of teams have broken their slumps before, and it's hopefully only a matter of time before the Oilers have their chance.

Wednesday, December 17, 2014

My Issues with the Broadbent Institute's Inequality Report


Apparently I have it in for think tanks or something. Every few months a think tank somewhere comes out with a report that means well, portrays a message with fundamentals I agree with, but manages to mess up some amount of the data handling in a way that gets me riled up.

This time it's the Broadbent Institute. They released a report on income inequality recently, and presented the data in a virtually identical format to an American video from two years ago. While I agree that income inequality is a big issue in Canada, and I'm sure that the average Canadian isn't clear with just how bad it is, I have a pretty big issue with the statistical rigor in their report.

This is a screenshot from their video:

Along the x-axis, they have different population percentiles in 10% chunks. The chunk on the far right represents the richest 10% of the population, the one to its left is the next 10% richest people, etc.

The problem with this chart is that it shows the 50-60th percentiles as being richer than the 60-70th and 70-80th. They're trying to tell us that the 5th richest group is richer than the fourth and third richest groups. 

What!? That doesn't make any sense by definition.

These values appear to come from this table in their report:

Somehow, Canadians apparently consistently think that the middle 20% of the population is supposed to have more money than the 2nd wealthiest 20%. That's not possible, and I can't believe that it got all the way into the report and the video without anyone hitting the emergency stop button. The income curve shown in the first figure ought to be a version of a Lorenz curve, and necessarily should increase from left to right. Even if that is the actual result from the survey, it shows that either the survey wasn't clear enough in its instructions, or that adequate controls weren't in place in the survey to ensure accurate results. 

When I brought this up to them on twitter, their response was:

Which is... silly. There's a logical distinction between a "strong middle class" and a "middle class that's stronger than the upper middle class." They've clearly decided to ignore this distinction.

Finally, a (more than average) nitpicky point. Take one more look at this graph (where the blue line was the original "ideal" values);

The blue line of ideal values has five data points in it, which is great, but they're nowhere near where they should be! Each point (or kink in the blue line) corresponds with a 20% chunk of data, so they might have shown the mid-point 10%, 30%, 50%, 70%, and 90% marks. But instead they've shown the 0%, 25%, 50%, 75%, and 100% marks. This implies that the 0th percentile Canadian (the absolute poorest person in Canada) ought to have the wealth value that the people surveyed thought belonged to the bottom 20% as a whole.

Anyway, the point of all this is that research into wealth inequality is really important, and doesn't deserve to be handled quite this badly. If you're going to be sharing this report, please do so with a grain of salt.