Tuesday, April 14, 2015

Edmonton Air Quality

This morning, I read an Edmonton Journal article that claimed that Edmonton's air quality was worse than Toronto's, even though we have five times less population than Toronto. The article's subtitle reads: "Particulate readings 25 per cent higher on some winter days."

I'll admit that my initial reaction to this was skepticism - the language used in the article seemed pretty wishy-washy and I wasn't sure what all the fuss was about. It's not terribly unnatural for some days in some cities to be worse than some days in other cities. Also, if pollution levels are particularly low on certain days, being 25% higher than another city is pretty easy and still reasonably healthy. So I decided to look into the numbers a little bit more.

The article continues, saying "pollution from particulate matter exceeded legal limits of 30 micrograms per cubic metre at two city monitoring stations on several winter days in 2010 through 2012." Ok, that sounds pretty bad, but what do these limits correspond to, and how bad is "several", really?

First of all, let's take a look at what makes air unhealthy. The Air Quality Health Index used by Environment Canada looks at three factors: Ozone at ground level, Particulate Matter (PM2.5/PM10), and Nitrogen Dioxide. Exposure to Ozone is linked to asthma, bronchitis, heart attack, and death (fun), nitrogen dioxide is pretty toxic, and particulate matter less than 2.5 microns is small enough to pass right through your lungs and play with some of your other organs. These aren't things you really want to be breathing a whole lot of. The AQHI for Edmonton today is a 3 out of 10, considered ideal for outdoor activities, but at a 10 out of 10 level people are pretty much encouraged to stay inside and play board games.

The report in the Journal article referenced PM2.5 only, which is particulate matter that's smaller than 2.5 microns. The maximum allowed levels for PM2.5 in Alberta are 80 micrograms per cubic meter (ug/m3) in a single hour, or 30 ug/m3 over a day. According to the Journal article, these levels were exceeded "several" times between 2010 and 2012. How many is several?

Data from the Clean Air Strategic Alliance

I don't know about you, but exceeding government safe levels for air quality on one day out of every eleven in 2010 is not what I'd call "several." There was over a combined month of air quality limits being broken in 2010 in central and east Edmonton.

I strongly disliked the phrase "25 percent higher on some winter days" due to its vagueness, but the idea of comparing Edmonton to Toronto seemed fun. Based on the CASA values for Edmonton, and the Air Quality Ontario values for Toronto, here's a comparison of the two cities from 2006-2012:


That's... not even close. Edmonton was 25% higher than Toronto for pretty much all of 2012, not "some winter days." This is enough to make me feel like perhaps the sources referenced in the Journal were using different data, or perhaps I'm mistaken, but the sources I used are all publicly available and I encourage you to check them out yourself.

But what about the other major air quality indicators? Turns out that, fortunately, exceeding their limits has proven to be much tougher. The maximum one-hour limit for nitrogen dioxide is 0.159 ppm, over 10 times the daily average for both Toronto and Edmonton recently:


Similarly, the one-hour limit for Ozone is 0.082 ppm, about four times the recent daily averages:


Again, these levels are much safer than the particulate levels were, and in general Edmonton is about the same or slightly better than Toronto for these indicators.

So all in all, I started out today thinking the article was being alarmist, if vague, and I've ended up thinking that it's well-meaning but presented oddly. Edmonton definitely does seem to have a problem with one of the major indicators of air quality, and if it takes a city-pride fight with Toronto to get that sorted out, so be it.

Monday, March 9, 2015

The 2015 Brier Playoffs were the Most Exciting in Years

Congratulations to Team Canada on winning back-to-back Brier tournaments! With that, the Canadian national tournaments are over for the season, and two teams are off to represent us at the Worlds!

Now, a lot of people don't find curling to be all that exciting. A lot of that comes from individual taste - some people haven't played curling or aren't from rural Saskatchewan, and without having been there before or having a vested interest in the teams, maybe some of the intricacies aren't all understood.

Either way, for the more ardent curling fans, the game is often very exciting. But some games are undeniably more exciting than others, and apart from the final score is there a numerical way to determine which games are the most riveting?

Sure there is! It's called the Excitement Index.

I first came across the Excitement Index (EI) in the context of the National Football League, where the owners of 'Advanced Football Analytics' have developed a model that predicts each team's chances of winning a football game after each play. They decided to develop an EI based on the absolute sum of chances in winning percentage throughout the game. A game where team storms to an early lead and holds it will have a much lower EI than one that's back and forth all game, and similarly a modestly successful play early in the game will likely have less effect on EI than one that clinches the game in the last minutes.

So I came up with a similar system for curling. Instead of looking at it play-by-play, I came up with a model that predicts winning percentages based on the score after each end. For instance, I now know that over the last 15 years of Brier tournaments, a team that's been up by 1 after 4 ends with the hammer has won 80% of the time.

Going through games end-by-end, using this model, can develop the total EI for that game. For instance, a particularly unexciting game could look like this: (1/2 page playoff game, 2014 Brier):


Last year, British Columbia got 2 with the hammer right off the bat, then stole one, leaving them up 3 without hammer after two - a very strong opening that historically results in a win 91% of the time. Alberta then only got one with the hammer, but in doing so gave the hammer back to BC for only a one-point trade, resulting in no real change in probabilities. BC then got 3 with the hammer, and the game was essentially wrapped up by that point (6-1 after 4 is virtually insurmountable for teams at the Brier). The sum of changes in probability in this game was only 0.27 after the first end - not a terribly exciting game.

On the other hand, here's the most exciting Brier playoff game I've analyzed: (2004 Final)


Nova Scotia started off not too well, only getting one with the hammer. The teams then traded 2-point ends, which resulted in a lot of back-and-forth in terms of probabilities while slowly inching in favour of Alberta. NS only getting one in the 5th was bad, but not quite as bad as AB getting three in the sixth. At this point it was 8-4 after 7, and being down by 4 after 7 with the hammer only has a win rate of 1.6%. The rest of the game was improbable, to say the least, and made for a very exciting finish. The total EI was 1.87, about 6 times 'more' exciting than the previous game.

It's important to note that a mid-game comeback isn't necessary to have a high EI value. The 'most exciting' game I could find while building my model was in Draw 6 of the 2009 Scotties (note: I have a separate model for men's and women's curling):



With all of the high-scoring ends at the beginning, and winning with a steal at the end, the back-and-forth swing throughout this game led it to a massive EI of 2.76.

All this leads to the fun finding that the 2015 Brier playoffs were the most exciting in recent history! Averaging the EI values for all games in the playoffs gives an average EI of 1.41 for 2015, with the final game (1.19) being the third most exciting final analyzed.


Stay tuned for next week, when I use the historical model I have to look at when it's better to take a point as opposed to blank an end, and compare men's curling to women's curling in a little more depth!

Friday, February 13, 2015

Voting Districts and Gerrymandering

You've maybe heard about gerrymandering, and if you have you've most likely heard of it with somewhat of a negative connotation. That's with good reason - gerrymandering is really bad.

For those of you unfamiliar with it, here's a brief example. Let's say you're the Blue Yuppies, a political party that is really popular with the hipster downtown folk, but not so much with suburban soccer moms. You won the last election in a city with four seats, but have lost popularity recently, and your polling tells you that neighborhoods in your city would vote like this in an upcoming election:

Super idealized city
Oh no! Only 28 out of 64 neighborhoods would vote for you! How can you stay in power in the upcoming election? If this city has to have four seats because of its population, and you're the party in power, maybe you could fiddle with the seat boundaries a bit to maximize your chances? Here are some options:


Intriguing. In all four options, the city is divided evenly into 4 districts. The top left option divides seats more or less into downtown vs suburbia and results in a 50-50 split, which is pretty good considering the Blue Yuppies don't even have 50% of the popular vote. The top right and bottom left options result in losses. Crazy enough, though, with just a bit of cleverness, the Blue Yuppies can sacrifice one district to their opponents, and just squeak by with three out of four of the seats - a solid majority - even though they have less than half the popular vote. Crazy!

This is what gerrymandering is. Redistricting is a huge power to have over your opponents between elections, because it can be leveraged quite well to retain power even against the popular vote of an area. And it's not always necessarily about giving yourself the comfortable easy seats - as in the bottom right example above, lumping all your opponent's support together can be a very effective way to secure the rest for yourself.

This often results in very funny looking districts. In the first three examples above, you could maybe justify most of the districts as effectively representing a region of the city. The gerrymandered one, though, has long snaky districts, in order to get the right sorts of people voting together.

Gerrymandering is alleged to happen in the USA a lot. Take a look at some of these examples and decide for yourself:

Capture ALL the cities!
Snaky...
And my personal favorite:
Because you know that highway is an important part of the district...
After too many shenanigans like this going on, some states are starting to insist on bipartisan committees to come up with new district boundaries, hoping to take the advantage away from the parties in power. The problem with this, though, is that the only thing the committee members from different parties tend to agree on is that they don't want to lose the next elections, and they may end up carving out districts that serve mostly to maintain their seats, leading to safe and boring elections. Yikes. 

Canada, on the other hand, tends to have completely independent commissions in charge of redistricting, with the goal of avoiding gerrymandering. But how well does our system actually work?

One way to check for potential gerrymandering is to check a district for its compactness. The underlying assumption here is that no point in a riding should be too far away from the middle, and that the riding should represent a reasonably consistent geographical area. Shapes like circles or squares are the most compact, shapes like Congressional District 4 up there aren't.

This fun report took a very in-depth look at compactness in voting districts in both the US and Canada. They measured compactness by comparing the perimeter of a district with the square root of the area, and then factored it so that a perfect square has a compactness value of exactly 1.

So how did Canada compare to the US? The average compactness values for 2006 were:

  • Canada: 1.400
  • USA: 2.103
From the report, the historical comparisons look like this:

Hill, 2009
For reference, if the average national districts were rectangles with the same areas, this is how they'd look:

So on that measure, Canada's pretty much always been better than the USA in terms of compactness (and, implied, in terms of likely gerrymandering), and the USA has seen a marked increase since about the early 1990s. 

But wait! Canada just did a full round of redistricting! We now have 338 ridings instead of 308. These are exciting times! Just how well did the commission do?

My analysis of the 338 districts using the same methodology as Hill (2009) shows an average compactness of: 1.408. In other words, nothing too much to worry about form a gerrymandering point of view! Hooray Canada for mostly transparent electoral systems!

The averages per province ranged from 1.27 for PEI to 1.59 for New Brunswick, and only six federal ridings had a compactness score worse than the average American district.

Saskatchewan was solid (average: 1.31)
Newfoundland and Labrador averaged 1.52, because shores are tricky
The great thing is that Canadian boundaries are still good, even at the provincial legislature level. Of Alberta's 87 provincial electoral districts, the average compactness is still only 1.440, very similar to the federal average and still way lower than the American average. In fact, only one Albertan district is worse than the average American district. Take a look at this super cool interactive thingy I made:



With the exception of Chestermere-Rocky View (which looks like it's giving Calgary a hug), all of these ridings are pretty reasonable, and they're all better than the average American voting district.

Go Canada! Our politics may not always be pretty, but at least there's no sign of anyone actively tinkering with them!

Tuesday, January 27, 2015

Pro Sport Team Mobility

Oh, the Oilers...

As of today, with a little less than half of the 2014-2015 NHL season remaining, the estimated odds of the Oilers getting into the playoffs are approximately 0.008%. They're currently last in their division, and the estimated chance of them staying there is 70.7%. Congratulations on another wonderful season!

The only thing the Oilers have going for them is that the NHL draft system gives a bonus to teams that do badly, with the hope of eventually balancing things out. Of course, the Oilers have supposedly been on the receiving end of this for a few years without success, but maybe this time it'll actually work?

But exactly how well does does the draft system work for helping the worst teams out in future seasons? I decided to find out.

The NHL has only had 30 teams playing since the 2000-2001 season, so I decided to stick with the years since then. I started by looking at the teams that ended up in either the top or bottom quintile (20% of teams), and tracking how likely they were to make it into either category over the following five years:


For example, the above graph shows that a team that finishes in the top 20% of the NHL one year has a ~50% chance to make it back in the top 20% within one year, and an 80% chance to make it within 5 years. On the other hand, the worst teams in the league only have a 7% chance to make it to the top within one year, and only a 50% chance to make it within 5 years. Cool, right?

Reversing the situation looks like this:


This results in somewhat of a different trend. Teams that do poorly have a 2/3 chance of being in the bottom fifth of the NHL again within 3 years, but after that it plateaus and there doesn't seem to be much increased risk of them doing terribly. Also, it's about half as likely for a great team to end up doing terribly at any point within the subsequent 5 seasons as it is for a poor team to do awesomely.

These results lead into a discussion of how closely correlated a team's performance is year-to-year. Plotting how successful they are one year against the subsequent year looks like this:


Definitely a correlation, but nothing worth placing a bet on. This says that, in general, the teams that did well one year are likely to do well the next. No huge surprise there. Looking further down the road:


Five years later, and there's almost no correlation at all. This is the sort of thing that ought to make Oilers fans happy, if only it weren't for the fact that they've been in that bottom 20% for four of the last five seasons. Ugh.

The change in correlation between seasons is shown pretty well by this:


Which is an excessively pretty and smooth trend between seasons. This apparent regression to the mean for teams on the whole, though, also applies really well to the teams on the extremes (the best and worst of any season):


Suggesting that, surprisingly, the best and worst of the league will be statistically equivalent after only 3-4 seasons.

So does this give us much of a prediction for when the Oilers will finally start showing up to play real hockey? Not really, though historically the trend has been that they ought to have a 50% chance of making it to the top of the league within 5 years, and that they'll be about the same as today's best in the league (eg Anaheim?) within about 3 years.

The final question then becomes just how much of this effect is caused by normal luck and regression to the mean, and how much is influenced by player trades and the draft system? Turns out that it's likely quite a bit - comparing four major pro sports leagues on of the charts above gives the following:

(Chart was made using 13 NHL seasons, 12 NFL seasons, 10 NBA, and 15 MLB for consistent league sizes)

The fact that there's such a variation between sports suggests that there's more to inter-season variation than just random chance, which is certainly promising news for all those coaches and general managers out there. It isn't terribly surprising that the correlation values for NFL are consistently low, since the number of games each team plays is drastically lower than the other three pro sports leagues. On the other hand, the results for the NBA are rather surprising - teams tend to do almost the exact same one year as they do on the next, but tend to have an inverse correlation five years down the road. If the Oilers had been an NBA team, they wouldn't expect to stay at the bottom for very long.

So while this year looks like another dud for the Oilers, there's always hope. Plenty of teams have broken their slumps before, and it's hopefully only a matter of time before the Oilers have their chance.

Wednesday, December 17, 2014

My Issues with the Broadbent Institute's Inequality Report

Ugh.

Apparently I have it in for think tanks or something. Every few months a think tank somewhere comes out with a report that means well, portrays a message with fundamentals I agree with, but manages to mess up some amount of the data handling in a way that gets me riled up.

This time it's the Broadbent Institute. They released a report on income inequality recently, and presented the data in a virtually identical format to an American video from two years ago. While I agree that income inequality is a big issue in Canada, and I'm sure that the average Canadian isn't clear with just how bad it is, I have a pretty big issue with the statistical rigor in their report.

This is a screenshot from their video:


Along the x-axis, they have different population percentiles in 10% chunks. The chunk on the far right represents the richest 10% of the population, the one to its left is the next 10% richest people, etc.

The problem with this chart is that it shows the 50-60th percentiles as being richer than the 60-70th and 70-80th. They're trying to tell us that the 5th richest group is richer than the fourth and third richest groups. 

What!? That doesn't make any sense by definition.

These values appear to come from this table in their report:


Somehow, Canadians apparently consistently think that the middle 20% of the population is supposed to have more money than the 2nd wealthiest 20%. That's not possible, and I can't believe that it got all the way into the report and the video without anyone hitting the emergency stop button. The income curve shown in the first figure ought to be a version of a Lorenz curve, and necessarily should increase from left to right. Even if that is the actual result from the survey, it shows that either the survey wasn't clear enough in its instructions, or that adequate controls weren't in place in the survey to ensure accurate results. 

When I brought this up to them on twitter, their response was:


Which is... silly. There's a logical distinction between a "strong middle class" and a "middle class that's stronger than the upper middle class." They've clearly decided to ignore this distinction.

Finally, a (more than average) nitpicky point. Take one more look at this graph (where the blue line was the original "ideal" values);


The blue line of ideal values has five data points in it, which is great, but they're nowhere near where they should be! Each point (or kink in the blue line) corresponds with a 20% chunk of data, so they might have shown the mid-point 10%, 30%, 50%, 70%, and 90% marks. But instead they've shown the 0%, 25%, 50%, 75%, and 100% marks. This implies that the 0th percentile Canadian (the absolute poorest person in Canada) ought to have the wealth value that the people surveyed thought belonged to the bottom 20% as a whole.

Anyway, the point of all this is that research into wealth inequality is really important, and doesn't deserve to be handled quite this badly. If you're going to be sharing this report, please do so with a grain of salt.

Monday, December 15, 2014

Winter Tires in Canada

Well there's snow on the ground and the temperature's pretty low, so we can pretty solidly declare that winter is upon us. And with wintry blizzards comes one of the great Canadian traditions: changing over your summer tires for winter tires.


If you're anything like me, you probably waited until just after the first major snowfall to remember to put them on. This often ends up with you driving around dangerously for a week waiting for your appointment, all the while dodging other summer-tire skidders. It's a fairly dangerous and unpredictable way to go about driving.

Recently I tried looking up recommendations for when to put on your tires and came to an interesting discovery: almost every single source recommends to put them on once the temperatures dip below 7 degrees Celsius. Everyone from the tire producers to the Tire and Rubber Association of Canada agrees with this fairly precise temperature recommendation.

Why? Turns out that summer tires are made of a different rubber that gets quite stiff below 7 degrees, and reduces the friction of the tires (the comparison that was used was that they approach the consistency of a hockey puck). Winter tires become more effective below 7C, even on dry clean pavement.

Not to scale. Probably.
If you're looking to drive as safely as possible (which you should, seeing as road injuries are the 9th leading cause of death worldwide), it might not be quite enough to just wait until the forecast predicts a temperature below 7, seeing as it often takes time to book an appointment and by that point it could be a bit late. Fortunately, Environment Canada has the daily temperature for various cities over the past several decades all neatly stored online.

So I decided to take a look. These are the average mean daily temperatures for Edmonton per day for the 30-year span between 1981-2010:


Since each day of the year has a decent variation to them, it's also possible to determine the expected probability that any given day will be below 7 degrees Celsius (using their averages and standard deviations). That might get you something like this:



Once you have this, it's fairly straightforward to choose when to put on your winter tires. If you were willing to accept a 50% risk of being ill-equipped for the weather, you'd be looking to put them on sometime around the beginning of October, and take them off around the beginning of May. That's vastly longer than I typically have mine on for, and I suspect that's the same for many people. In total, an Edmontonian ought to have their winter tires on by October 1st, and leave them on for 210 days (at least seven months of the year!).

Of course, a 50% risk of having the wrong tires might seem a bit high for some people. If you were only willing to accept a 10% risk, you'd be looking at 261 days of winter tires starting September 4.

So that's all well and good for Edmonton, but how about the rest of the country? I decided to look at 30 stations' worth of data spanning 1981-2010 (~330,000 data points) to try to develop a map for winter tires in Canada. These stations included all major cities and a few select points to accurately represent the geographical differences. This is what I got:


Unsurprisingly, the northern territories tended to need winter tires more than the southern provinces (quite frankly, it's not worth taking winter tires off if you live in Iqaluit). What might be surprising to some is that even the warmest parts of the country, that hardly ever see snow, ought to have proper winter tires on for at least a third of the year.

Another way to represent the data is to show the probability of being below 7C on any given day like this:


Where green means 0% chance of being below 7C, and red means 100%.

The vast majority of Canadian cities have a high risk of being below 7C sometime in October, and it's important to know when exactly that will be in order to be sure you're driving with the best equipment available. In fact, the above graph can be summarized as follows:


One final thing to note: only the province of Quebec has legal requirements for winter tires, with the exception of some British Columbia highways. These legal requirements fall way outside of the 7 degree recommendation though. It's all well and good to have laws for additional safety when operating motor vehicles, but if they fail to capture the designed temperature ranges of the actual tires, it seems like a bit of a missed opportunity.

Monday, December 8, 2014

2014-2015 ski season

In case you haven't noticed, it's snowed a bit recently in town. And any time it snows in Alberta, I get excited that it's likely been snowing up in the mountains. And that means skiing!

As of December 7, the website OnTheSnow shows that Marmot Basin (the closest ski hill to Edmonton) has a snow depth of 90 cm. That sounds rather decent, and certainly right at the end of November it got a massive dump - but how does that actually compare to normal? I decided to figure out.

Here is the cumulative snowfall of Marmot basin for every ski season since 2007-08:


Alright, so there's quite a bit of variation in there. Maybe a better way of looking at it is like this:


For these graphs, the grey zone represents the maximum and minimum values over the last seven seasons, the light grey line is the average, and the black line is this season so far.

So there's good news and bad news here. The good news is that there's actually quite a bit more snow this year so far than normal! In fact, there's about as much snow at Marmot right now as there typically is by about January first. All in all, maybe not a bad time to go there, in fact!

The bad news is that, apart from two huge dumps, there really hasn't been any action in Marmot. It was way below any of the previous seasons measured until two weeks ago. Marmot looks like it's in a good position now, but if it hadn't gotten luck at the end of November it would pretty much just be rocks. In fact, we can tell it *has* been lucky - Marmot Basin typically only gets two to three snowfalls exceeding 20 cm per day per season (actually 2.43 average), and has already had two this year. Lucky for it now, but it's hard to predict for the future of the season.

Marmot Basin is also relatively easy to predict - on average by the end of the season, its total snowfall has a coefficient of variation of 37.1%. It also has a reasonably early season, with 100 cm of snow fallen on average by December 31.

But how about other Alberta ski hills? Take Sunshine Village, for instance:


Sunshine has a similar situation to Marmot Basin. It's been lagging behind previous years until the end of November (though still within normal ranges), and is now pretty much back on track. Hard to say how that will hold up though. They don't typically reach 100 cm of snowfall until a bit earlier than Marmot (average December 18), and tend to be more predictable (coefficient of variation of 23.8%). They also get far more snow in total than Marmot Basin does...

Lake Louise enjoys a base of 100 cm on average by December 16, but is raucously tricky to predict (coefficient of variation at end of season of 44.3%). Lake Louise has had the same problem as Marmot Basin - it had far less snow than previous years up until a sudden burst rather recently, but it's been flat since. Hopefully that isn't terrible news for the season...

Nakiska's almost doing the best for this time of year out of any of the last 7 years! Good for it. They tend to have more variation at this time of year than other Alberta hills too, so it's actually a bit tougher to say if they'll have a good season or not. They tend not to get a 100 cm base until around January 23rd, and have very unpredictable seasons, with a coefficient of variation of total snowfall of 48.2%.

Norquay's a bit sad. They're well within previous years' ranges, but it's still not looking nice. They'll get their first 100 cm on average by February 10 (yikes), and have a variation in total snowfall around 37.6%. Some years they don't even get 100 cm of snow, though.

Castle Mountain's another sort of sad mountain with a later season (100 cm average by January 11th) and high variability between seasons (43.2%). Both Castle and Norquay seem to have missed the awesome snow dump that the rest of Alberta had, but are tending to stick a bit better into where they'd be expected at this point in the season.

So overall for mountains in Alberta, it's looking like now is a great time to go to Marmot, Sunshine, Lake Louise, or Nakiska. They're certainly at least all doing much better than average for this time of year, and will likely continue to be above average for the rest of December.

Summary:

Earliest decent season: Lake Louise (December 16)
Highest average snowfall: Sunshine Village (486 cm by May)
Most predictable: Sunshine Village (23.8% variation by season)

The sad thing is... BC mountains do way better on almost all counts. Take for example Fernie:

(100 cm by Dec 22, average snowfall 705 cm, COV 25.8%)

Or Whistler:


(100 cm by November 24, average snowfall 796 cm, COV 27.7%).

Both mountains consistently and reliably get far more snow than anything in Alberta. While that may make them sound great on paper, they still haven't had the trend-bucking dump that Alberta mountains have had, and are currently lagging quite far behind their Alberta peers. So while I can't guarantee that they'd have particularly good December skiing this year, you certainly ought to be able to rely on them for quality skiing in the mid- to late-season!