Extreme Enginerding

Measuring Inequality

2023-07-23T17:54:00.002-06:00

It seems like with increasing frequency we hear about rising inequality, both with wealth and income distribution across our country and the world as a whole. We see articles regularly like this and this, accurately describing it as a major issue for our times.

With so much interest in the topic, it’s probably unsurprising that it’s a well-studied field. Before you can properly wrap your head around something you have to measure it, and in order to get policy makers to pay attention you pretty much have to boil that measurement down to a single number. So it isn’t shocking at all that economic inequality can be measured by a single value, known as the Gini Coefficient.

I Dream of Gini

To start looking at measuring inequality we can survey a population, rank people based on their wealth, and compare the percentage of people poorer than a given person to the percentage of wealth held by people poorer than that person. Effectively, these two values will be identical in a purely even distribution but further apart as the inequality starts to grow. If everyone has the same wealth, then the poorest 20% of the population will have 20% of the money, and the poorest 70% will have 70% of the money. If we plot this, we’d get a straight line of equality:

A more unequal distribution might look like this, though, where the poorest 20% only has 5% of the wealth, and the poorest 70% only has 50%:

The Gini coefficient compares these two curves, the equality curve and the actual curve of the population, by comparing the area under the actual curve (B) to the area between the curve and the line of equality (A). The bigger the area between the curves (Area A), the bigger the Gini coefficient, so a Gini of 0 means a perfectly equal society, and a Gini of 1 means effectively that all the wealth is concentrated in the hands of one person.

Gini in a Bottle

The Gini coefficient isn’t a perfect way of measuring inequality, but does a pretty good job. In the absence of social programs like a universal basic income, it’s worth pointing out that there will probably always be a non-zero Gini income coefficient, and that that’s not inherently evil. For instance, people late in their careers tend to make more money than newborn infants, and we’re generally ok with that.

The Gini coefficient also could give the same number to different distributions, if the shape of the curve is different but still results in the same relative areas. This means that overall it’s better as a relative indicator of inequality than a pure comment on the status of a society.

Unleash the Gini

As a very basic example for figuring out a Gini coefficient of our very own, we can take a look at a 10 player “Sit n Go” poker tournament. Following a common model used in online tournaments, 10 players sign up and the winner gets 50% of the pot, second place gets 30%, and third place gets 20%. Everyone else gets nothing, though hopefully has lots of fun too.

If we wanted to plot the curve we talked about before (incidentally, called a "Lorenz Curve"), we could use the information that the bottom 70% (the 7 losers) get 0% of the wealth, the bottom 80% (7 losers + third place) own 20% of the wealth, and the bottom 90% own 50%. Put that all together and we get this graph:

Area A, between the curves, can now be compared to the total area of A+B, and we get a Gini coefficient of 0.76.

Before we get to the actual point of all this, it’s worth taking a second to reflect here. Splitting a population into ten groups and having 50% of the wealth go to the group that's best at poker is no basis for a system of distributing wealth. That's 50% of all cash, stocks, bonds, houses, privately held land, and super yachts. Extending the analogy, even if we were to pretend that the "best" 10% is who ends up with half the wealth, where here poker ability might correspond to concepts like hard work, diligence, education, etc, that still feels raucously unfair to end up with a distribution as shown above. And that's ignoring the fact that a significant proportion of wealth is significantly correlated to the wealth of one's parents, negating a lot of the 'hard work' argument.

So here's the issue. The Gini coefficient for the distribution of wealth in a 10 player online poker tournament is 0.76. The Gini coefficient for the distribution of wealth in Canada is 0.73.

Now on the one hand, admittedly there's a little bit of room between 0.73 and 0.76. 0.76 is about the same relative inequality as in Vietnam, a bit worse than a country like Egypt (0.756) and a bit better than a country like Bolivia (0.764).

On the other hand, Canada is about the same as countries like Uganda and Liberia, which may come as a surprise to some overly self-righteous Canadians. As well, the most recent statistics are from 2019, and studies show that inequality has only risen during the Covid-19 pandemic.We very well could be worse off than if our society had been set up as though by poker tournament.

Another thing to mention is that, as I said before, the Gini coefficient doesn't really comment on the shape of the Lorenz curve, just the area. And obviously Canada doesn't have 70% of people with absolutely no wealth, so maybe while the numbers are similar they don't mean the same thing, maybe it isn't quite as dire as it sounds?

Unfortunately it's not that simple. A 2012 report from the Broadbent Institute showed this graph for Canada:

Shockingly, the top 10% (at that time) ended up with almost half the wealth, not far off from the poker example. The bottom 10% didn't just have “no wealth”, they owed more than they owned. I'd argue that if you want to think about just how rich the rich are in Canada, a 10 person poker tournament is, distressingly, actually a very good analogy.

As an aside, at least in terms of poker analogies, things can always get worse. The $10,000 Main Event at the World Series of Poker handily posts its payout table online, and if you do a similar analysis you get something much much worse:

This Gini coefficient comes in at a whopping 0.94, thankfully much higher than any real country. This is what a Lorenz curve looks like when 0.5% of the population has 50% of the wealth, and is genuinely terrifying to contemplate as a future if we don't sort things out in the real world.

Canada has a long way to go in terms of wealth inequality, but obviously it gets worse too. The United States (0.852) and Russia (0.879) have absurdly high wealth inequalities. But worst of all? The world as a whole, sitting at a Gini coefficient for wealth distribution of 0.885. We have the means to measure this and the tools to do address it, and it's well past time we do something about it.

Voting Patterns for Edmonton City Council's 2017-2021 Term

2021-06-30T08:20:00.000-06:00

City Council, unlike other levels of government, doesn't rely on party systems for categorizing its members. That being said, there still can be, and in fact are, patterns in how members of council vote, and with the Open Data that's available on council voting records, these patterns can be examined.

There are a lot of different ways to visualize voting patterns, and I've played around with these before (see here and here - unfortunately, since most of the visuals for this blog relied on the now-dead Google Fusion Tables, there's really not much to see). I've settled on three favourite methods for the 2017-2021 Edmonton city council term - let's take a look!

First of all, as in previous years, I've disregarded all motions that were unanimous as they provide no particular differentiating information. That leaves the 2017-2021 term with 921 non-unanimous votes to examine (at time of writing).

The first pattern-finding method I like to use it to simply look at the success rates of each member of council. How often did a vote go the way they wanted it to? This can be a sign of consensus-building, or an indicator of work put in behind the scenes (perhaps at other committees), or potentially a matter of being a part of a majority bloc that tends to vote similarly:

While a direct comparison is perhaps unwise, these number in general follow the same pattern as my similar 2016 analysis. Of members of council who were present both years, Councillor Esslinger and Mayor Iveson are again the top two and Councillor Nickel is again the lowest. Councillors Walters, Knack, and Henderson are all within 5% of their 2016 results as well, with Councillor Caterina showing a slightly larger difference from before.

This of course is not intended to imply anything about the effectiveness of individual members of council, and is performed without a review of the motions themselves (whether they are procedural, multiple readings of the same bylaw, etc.).

Noteworthy from the last analysis was the result that Mayor Iveson had only 'lost' 17 votes out of 358 non-unanimous motions in the previous term. For comparison, at the time of writing, this number is now 94 votes.

A second pattern-finding visualization is how often members of council agree with each other. For the 2017-2021 term so far, that is:

The result from this analysis shows that a group of six members of council agree with each other more than 80% of the time across all pairings, and that a seventh member (Councillor Henderson) is just outside with a 79% minimum agreement rate (with Councillor Hamilton). With a council size of 13, seven members is a winning majority on most motions. Certainly, there is a correlation between the top six council vote winners and this group of six members of council - whether this group is ideologically similar or just more likely to compromise and build consensus is beyond the scope of this analysis though!

A third and final pattern-finding visualization that I quite like is adapted from the NOMINATE system used to scale members of the United States Congress. It is intended to represent ideological similarities and differences between members of council in a spatial manner - members closer to each other agree more, and further apart agree less frequently:

I'd like to stress at this moment that, as it's often tough to assign traditional political ideologies to city council bylaw amendments, this graph does not necessarily represent traditional 'left vs right wing' traits, nor traditional 'authoritarian vs libertarian' traits. The results of the graph are intended to model councillors as though their decisions are made based solely on two non-correlated factors, and the model above is oriented with the most significant factor aligned along the x-axis.

It's totally cool if you want to stop now, but I actually really love this model system and I want to talk about it a bit more since it gained some interest when I did this for London. Effectively, the NOMINATE system models both councillors and motions along the two axes, then assigns a probability of each councillor voting one way or another based on the relative proximity to each "side" of a debate. The algorithm then iterates thousands of times, tweaking the positions of each councillor and motion in such a way to optimize the probabilities of each decision.

The net result of this is that, using only two dimensions, this use of the NOMINATE algorithm as it stands currently accurately assigns the correct vote to each councillor 93.4% of the time. While to some of you this may not seem perfect, a model that reduces the complexity of council decisions to two factors with over 90% accuracy is something I'm quite astounded by and happy with.

For instance, last week's vote to end the mask mandate effective July 1st broke down like this based on the model. Here, the orange coloring indicates 'voted no', and blue indicates 'voted yes', with clear circles for the locations of the decision-points:

Here, the percentages are the model's prediction at the odds of each councillor voting the way they did. The "yes" and "no" points are shown, and the dashed line indicates the mid-way point between the two positions. In this case, the model managed to accurately capture each member's vote (where accuracy here is defined by a yes vote with more than 50% probability, or a no vote with less than 50% probability). The probability doesn't necessarily reflect the difficulty a given member of council had in making their decision, and is more of the measure of accuracy of the model.

By looking at all votes together, the model slowly hones in on the best placement for each member of council. Not all votes are as clean cut as this one - for instance, the vote on the solar power plant at EL Smith looked like this:

You can see here that the model was very close with councillor McKeen, and effectively swapped Caterina and Dziadyk. Again, as the model is probabilistic this doesn't mean it got these 'wrong', more that having these councillors and decision points in these locations is optimized over the entire term.

It's not a perfect model, but again I'm quite pleased with how accurately it is able to capture the voting term in only two dimensions!

So that's it - three different ways to look at the data, showing different aspects of what can be learned from it!

Which Edmonton City Councillor are you?

2021-06-28T08:06:00.000-06:00

I've done this before, and had so much fun with it that I'm happy to once again present:

A Buzzfeed-style quiz to get you more in touch with your elected representatives!

(it's totally ok if that doesn't excite you as much as it excites me)

Without further ado, here is a quiz for you to play around with. All decision points in the quiz are pulled from real votes in the 2017-2021 city council term, with information and sources provided.

Hopefully that was fun!

Like I said, I've done this before for Edmonton and London, and London was far more excited about it. The work that goes into these is an interesting mix of politics, whimsy, and data work.

The first step is to analyze the City of Edmonton open data set for Votes and Proceedings. For no discernable reason, the data set this term is inconsistent and halfway changes how votes are recorded, as well as changing how councillors are named. It's not particularly tricky to deal with, but it did have to be massaged a bit to be in a consistently usable format.

For this quiz, there's not much point in looking at unanimous procedural votes, so I focused on the 921 (at time of writing) non-unanimous votes. In an ideal world, a set of yes-no choices should require four or fewer questions in order to neatly sort into 13 possible answers (assuming approximately even splitting at each decision point). However, it's much more interesting and easy to answer the quiz when the questions are relevant and engaging.

Most of the examples I chose for this quiz have news stories attached, which in my mind was a sign of that I'd found adequately interesting votes to base this on, and as a result a user on the quiz can get to a councillor with anywhere from three to five questions, which I was satisfied with.

Hopefully you are too, because at one point in the design of this quiz one of the leading optimal votes was "That City Council waive the rules on providing notice of motion as set out in section 32 of Bylaw 18155 - Council Procedures Bylaw to allow Councillor S. Hamilton to make a motion without notice regarding the aerial mosquito program." It would've made things work so well but, well, it's hard to really care about it.

Each of the final results in the quiz genuinely leads to member of City Council who voted in the same unique way as the answers you provided. One assumption was made, which was that while Mike Nickel did not vote on his own censure, it was assumed that he would have voted no if he was forced to.

Hope you had fun!

Which London city councilor are you?

2019-04-27T14:56:00.000-06:00

Open data can be used for a lot of things, and public meeting minutes of elected representatives are crucial in holding representatives accountable, ensuring they represent their constituents, and promoting honesty and efficiency in our government.

Or they can be used to make Buzzfeed style personality quizzes. That's what I did.

We've now hit a point in the City Council meeting minutes from this council so far where all councillors have disagreed with eachother on interesting votes at least once, which allows us to strongly differentiate between them. By presenting some of these votes, we can narrow down a few key motions that separate all the councillors, and present it in a Classification Chart. Since that's not as fun as a quiz, though, here it is in quiz format.

Share widely, and tell me who you got! (It may take a second to load)

Alberta 2019 Election Post-mortem

2019-04-22T14:32:00.001-06:00

Well that was fun!

How did I do?

For more than a year now I've been tracking Alberta election polls with the hope of developing a reasonably accurate prediction model. Overall, I'm happy to report that the party I predicted in the lead won in 80 out of 87 races, and my riding qualifiers broke out as follow:

"Solid" lead: 65/65 (100%)
"Likely" lead: 12/15 (80%)
"Lean" lead: 2/5 (40%)
"Toss-up" edge: 1/2 (50%)

I think this is a decent proof of concept, small "lean" sample size notwithstanding, and I want to talk a bit about what went right and what went wrong, and how I can improve if I want to keep doing this sort of thing.

First of all, the polls leading up to election day didn't turn out to be too accurate. Take a look at the province and regional splits:

Edmonton was remarkably accurate, Calgary was close, but the rest of the province and the top line results were off significantly. This is possibly a cause for concern, as it could suggest that my model was taking inaccurate data as inputs but then claiming credit for an accurate output, which it wasn't designed to do.

The NDP ended up under-performing relative to their polling numbers, and likely the only reason this didn't mess up too many election prediction models is because they under performed mostly in areas like rural Alberta, where they were predicted to lose anyway. If the polls had been that wrong about the NDP in Edmonton, say, the predictions could have been far worse.

Similarly, my model and others like me likely wouldn't have fared too well if the NDP had overperformed their polling rather than underperformed. The same amount of polling error as actually occurred, applied the other direction, could have had the NDP win the popular vote across the province.

My takeaway from this is that I need to adjust my topline polling tracker. Right now it runs under the implicit assumption that errors in individual polls will cancel each other out. This seemed reasonable given that polls are produced by different companies with different methods. That led to my full Alberta tracker having a low confidence interval for the NDP in particular, though, as several polls in a row provided the same result. If I instead make the assumption that at least part of the polling error is correlated between polls, perhaps due to something beyond their control, then the final result from election night would have still been a surprise, but far less of one. Certainly something I'll take into account next time.

Other Metrics

Overall on a riding-by-riding level, I had an error of 6.4% vote share. That's not superb, but also not far from what my testing beforehand suggested, and was factored into my uncertainty. Comparing my final projection to actual results on election night doesn't look too bad:

If we ignore the Alberta Party and the Liberals, this leads to an overall R-squared value of 0.79, which I consider respectable. It's handy to ignore the low parties because they don't have much of a spread, and will skew the coefficient of determination calculation.

Very fortunately for me, if I input the final actual regional results as though they were a poll result, my model does improve. This is a good hint that my model is behaving decently, especially so since this hasn't been the case with all other forecasters.

With the correct Calgary, Edmonton, and Rural results input as large polls, my model improved to 83/87 seats correctly predicted and an R-squared for party support per seat of 0.91. Very encouraging - too bad the polls weren't more correct!

Finally, I also provided an expected odds of winning each seat for each party. It's one thing to count a prediction as a success if you give it 100% odds of winning and it comes true, but how does one properly score oneself in the case of Calgary-Mountain View, where I gave the Liberals (10.8%), UCP (16.2%) and NDP (73%) different odds of winning, and only one (NDP) did?

In this case I've scored each riding using a Brier score. A score of 0 means a perfect prediction (100% to the winner and 0% predicted for all losers), a score of 1.0 means a perfectly wrong prediction (100% to one of the losers), and because of the math, a score of 0.19 for a complete four-way coin toss (I only predicted the four parties represented in the debate).

Overall, I scored a 0.027, which is considerably better than just guessing. It's hard to get an intuitive sense of what that score really means, but it's mathematically the same as assigning an 83.5% chance of something happening and having it come true. Not a bad prediction, but there's room to be sharpened.

How did I stack up?

So like I said, there were a lot of us predicting the election this time around. I've tried to find as many as I can, and I apologize profoundly if I've missed anyone. I've only included forecasts that had either a vote breakdown per seat or anticipated odds of winning each seat for comparison purposes.

I've reported on three main measures (seat accuracy, R-squared per seat, and prediction Brier score), and I'll present as many of those for each forecaster as I was able to determine. Different forecasters win at different categories, so it's not necessarily a clear picture as to which one of us is the "best", so I'll mostly leave room here for interpretation:

I'm not claiming to be the second best, but it's important to note that being best in one measure doesn't necessarily mean best overall. There are also harder-to-evaluate measures in play here - for instance VisualizedPolitics and TooClosetoCall allow you to input poll values to see reactions for yourself, and both improved when given more accurate data (VisualizedPolitics also got to 83 seats accurately predicted, though still with a low R-squared value).

338Canada probably rightly can claim to have been the strongest this time around, but I given the polling errors we were faced with I think it'll take several more elections to determine if anyone is really getting a significant edge consistently. This isn't the first time we've compared ourselves to each other, and I think it's an important exercise in evaluating our own models and whether there's a need for more.

London Instant Runoff Breakdown

2018-10-25T08:25:00.002-06:00

London (Ontario) just had its first election using instant-runoff balloting. As I've mentioned before, I'm very interested in different forms of electoral reform, so as a new resident of London I was intrigued as to how the vote would work out.

London's system is a bit unusual inasmuch as voters can only rank their first three choices, but otherwise follows a pretty classic Instant Runoff system. Many of the elections resulted in first round winners, and therefore don't have a lot of room for fun analysis, but some of them went deeper and I thought it might be fun to show how the progressed in a Sankey diagram!

First of all, here's Ward 5 (my ward!):

As with all of the following, the leader in the first round ultimately ended up winning. Due to the lack of ability of voters to rank more than three candidates, the number of exhausted votes tends to grow quite quickly after the third round. Interesting patterns include the large number of Clarke supporters moving to Cassidy, and the relatively large number of Knott supporters preferring Warden over Cassidy at the end.

Ward 8

This race ended closer than it began, and likely didn't see any change in leader throughout the race due to the lack of strong trends in down-ballot rankings.

Ward 9

This race ended quite quickly, with Hopkins getting more than 50% of the vote by the third round after preferential support from Charlebois' supporters.

Ward 12

Similar to Ward 9 - disproportionate support from Mohamed's voters to Peloza secured a win in the fourth round.

Ward 13

One of the tighter races of the election. Kayabaga drew large support from Warren and Hughes supporters, whereas Fyfe-Millar drew more support from Wilbee and Lundquist voters.

Ward 14

Pretty straightforward - along with being the top first choice, Hillier was the preferred alternate for both Tipping and Swalwell's voters leading to a more secure finish than start.

Mayor

(Click to zoom and enhance!)

This one was far more lopsided than all the others. In the early rounds of voting, there was a small amount of jostling for positions 7-9 in the rankings, but apart from that no real changes occurred until Cheng's elimination. No abnormally strong trends in down-ticket voting occurred, though, so Holder held one throughout the end.

The city clerk has promised more detailed information to come out soon, so stay tuned for further analysis!

London City Council

2018-09-17T08:30:00.000-06:00

Wow it's been a while since my last post. My apologies!

A principal reason for this is that I've moved - I'm no longer an Edmontonian, and am now a Londoner! London Ontario, that is. This almost definitely means I won't stop posts about Edmonton, but does mean that I'll be increasing my Ontario content.

London is currently in the midst of a civic election, so like any good new citizen to a city my first thought was to learn as much about the current council as I can so that I can make as informed a decision as possible. London's open data is pretty good, but their votes and proceedings aren't as organized quite as well as Edmonton's are.

Nonetheless, with the votes and proceedings that are available, I thought to take a look at council relationships in London in a similar way to how I did in Edmonton two years ago.

Unanimous votes aren't interesting, so I've focused this analysis on the 638 non-unanimous roll call votes as recorded in meeting minutes. First of all, let's take a look at how often each councillor agrees with each other:

Matt Brown is the mayor, and currently enjoys at least 70% agreement with 11 out of 15 councillors, which isn't too shabby. In general, there appears to be a mild bloc of six people (Brown through Park) who all agree quite strongly with each other, another similar block (Park through Hubert) who do the same, and then a handful of councillors who seem to go their own way.

Another sign of consensus-building on city council is the frequency that each member of council has the outcomes of votes in line with how they voted. Again, looking only at non-unanimous votes:

The mayor has been on the losing side of 51 votes out of 610 in which he's been present or not recused, which suggests a reasonable level of consensus building (though not quite as high as Iveson in Edmonton).

If we plot a graph of councillors, and connect them only if they agree at least 67% of the time, we get the following:

The cut-off here was chosen in order to include councillor Turner while still highlighting differences in agreement rates. Unsurprisingly, councillors Turner, Helmer, and Squire are relative outsiders, with a strong cluster of the six councillors mentioned before in the center. Also, this type of graph is incredibly satisfying to play with - enjoy at your own risk!

While showing relative outsiders, this plot doesn't really demonstrate any significant voting blocs. Another way to present the same data is to only connect members of council to whoever they agree with the most often. Doing that results in the following:

Here we get a more interesting structure. Nearly as many people agree more often with councillor Zaifman than Mayor Brown, though there are no separated islands of voting blocs. Only two members of council agreed with each other the most mutually, Matt Brown and Maureen Cassidy, an observation that is provided without further commentary.

The last way I'll look at voting patterns is to scale them using a variant of NOMINATE. This method was developed for analyzing US Congress voting patters, and can assign voting members to a political spectrum without needing to know what the bills being voted on were. For more information, this link is a fascinating read.

Obviously a city council is going to be less partisan than a parliamentary system, but the relative placement of councillors on the graph correlates with how often the agree or disagree with each other, as well as an approximate alignment on issues. I'll detail how this was developed in a subsequent post, but the short version is that each vote is also given a numerical position, and councillors who are closer to the "yes" vote than the "no" vote are assigned probabilities to vote either way. This is then trained against the actual vote data, and thousands of iterations of machine learning later we get this distribution.

Hopefully this has been an interesting glimpse into London city council. Have a fun election!

Ontario Election Wrap-up

2018-06-08T09:12:00.001-06:00

The 2018 Ontario General Election is over, and if your team won then congratulations to you!

Over the last month or so I've been tracking the election polls and testing out a few different ideas in order to improve a general model that I'll end up using for the upcoming Alberta election. Of course, I wasn't the only person doing this, and I was able to find at least six other sites tracking and projecting alongside.

But who did the best? Can we learn anything specific about which models produce more reliable results?

First of all, we can look at seat projections. As far as I could tell by mid-day June 7th, this was the seat projection distribution between the seven of us:

	CBC	Too Close to Call	QC125	Lispop	Teddy on Politics	Calculated Politics	Extreme Enginerding	Average	Actual
PC	78	74	70	69	60	71	70	70.3	76
NDP	45	46	47	50	55	44	45	47.4	40
LIB	1	3	6	4	8	8	9	5.6	7
GRN	0	1	1	1	1	1	0	0.7	1
OTH	0	0	0	0	0	0	0	0	0

Ranking these by the root sum of squares difference from the actual results, we get:

Calculated Politics (diff: 6.48). Their method involved seat-by-seat projections, suggesting a regional breakdown that seemed to work pretty well for them!
Too Close to Call (diff: 7.48). They also provided seat-by-seat projections, and had regional factors involved to project those. Also, most handily, their simulator was interactive, but putting the correct values into it actually made their predictions slightly worse (still second place at 7.87 though).
(Tie: CBC and Me) (diff: 8.12). We ended up with the same predictions for the NDP, but CBC was way under for the Liberals and I was quite a bit under for the PCs. My model didn't involve individual seat projections and instead just approximated historical trends for seat ranges based on party vote share, so that's a win for simplicity I suppose.
QC125 (diff: 9.27). Another site with seat-by-seat projections. The actual seats fell well within their expected ranges, but were all off by a little bit. I'm unsure how they came up with the seat vote projections.
Average (diff: 9.48). In this case, the wisdom of the crowds didn't pan out.
Lispop (diff: 12.57). Hypothetically they used a regional swing model similar to mine, so I'm not quite sure where the difference comes from here. It looks like they anticipated a much higher NDP voter base than actually happened.
Teddy on Politics (diff: 21.95). It seems like Teddy paid more attention to leader favorability numbers than most of the rest of us, and that seems to have tilted the seat distribution against him. His was the only model to predict a minority government.

For most of the models, the seat projections came directly from the popular vote estimates. If we take a look at those, we get:

	CBC	Too Close to Call	QC125	Lispop	Teddy on Politics	Calculated Politics	Extreme Enginerding	Average	Actual
PC	38.7	37.9	37.8	38	37.9	38.4	39.8	38.4	40.5
NDP	35.5	36	36.1	37	36.8	36.1	35.9	35.9	33.6
LIB	19.6	19.8	19.7	19	20.9	19.5	19.6	19.7	19.6
GRN	4.9	4.6	5	?	4.5	4.6	5.2	4.8	4.6
OTH	1.3	1.7	1.4	?	0	1.5	1.3	1.4	1.8

Ranking these again by the same criteria we get:

Me! (diff: 1.15)
CBC (diff: 2.69)
Average (diff: 3.21) This is a better example of the group as a whole performing better than most individual members. This also probably makes sense as these numbers would have come mostly from the same pool of publicly available polls with a small amount of interpretation for trends and recency, as opposed to a large amount of interpretation as in the case with seat projections.
Calculated Politics (diff: 3.29)
Too Close to Call (diff: 3.56)
QC125 (diff: 3.73)
Lispop (diff: ~4.3) Note that Lispop didn't list their prediction for the green party vote total, despite projecting them to win a seat.
Teddy on Politics (diff: 4.37)

Overall I'm really pleased with how I did, and I've learned a few tricks to use in upcoming elections. Next up will probably be Québec, hopefully with the same group of people, and we can see if this was a fluke for me or not!

Finally, here's my seat model with the actual results input as though they were one final gigantic poll at the end. Using these correct values would have resulted in the model being the most accurate seat projection of them all (diff: 4.24), which is an encouraging sign that the model itself was sound!

See you next election!

Alberta Electoral Districts

2018-04-24T10:47:00.003-06:00

A few months ago, the Alberta Electoral Boundaries Commission released its report with recommendations on how to redistrict the province for the 2019 election. As I discussed before, this is an important process that occurs every eight to ten years, and is necessary for keeping the provincial electoral boundaries up to date with current population distributions.

As a quick aside, I'd like to thank everyone who, after reading my post on redistributing using the shortest splitline algorithm, actually wrote in to the commission to tell them to do that. Thanks guys!

Redistricting is always a hot topic, as it can lead to accusations of tampering or gerrymandering by those in power. In Alberta the process is ostensibly done by an arms-length body, and as such when the results were unveiled in October the complaints were pretty tame from the parties not in power. The major effect of the redistricting was to merge rural ridings in such a way that three more urban ridings were created.

In 2015, the poll by poll results for Alberta looked like this:

Here, each poll is shaded a darker colour if the party won by 50% of the vote or more. It's pretty fun to zoom around in it!

These polls were fitted to the 2015 riding boundaries, and if we break them out then add the votes back together according to the new 2019 boundaries, we can get a sense of what the outcome for future elections might look like. The process isn't perfect, as not all polls fit precisely into each new riding, but ultimately this is how the 2015 election is likely to have looked under the 2019 redistricting:

This map is coloured the same way as the poll map above.

The impact this would have had on each party is:

The total seats won by the NDP wouldn't have changed at 54
The total seats won by the Wildrose would have decreased from 21 to 20
The total seats won by the PCs would have increased from 10* to 12
The Alberta Party would have stayed at 1 seat
The Liberals wouldn't have won any

The next election is a little over a year away, and these will be the ridings to be determined in that election. Stay tuned as I work to better develop my seat projection model and poll tracker over the next year!

Edmonton Election 2017

2017-10-23T14:41:00.001-06:00

Another election has come and gone, and apart from a handful of new faces the biggest news is all the new stats! Let's take a look:

First of all, turnout was abysmal. A total of 194,826 people voted, resulting in a voter turnout of 31.5%. The best (blue) and worst (red) areas of the city in terms of voter turnout are shown here:

The colouring of the map is a bit funky since the mean and median are rather far apart, but it gives a decent impression of what happened. In general, it looks like neighborhoods around the river valley voted more often than neighborhoods away from it, which is interesting. The massive difference between the high (66.9%) and low (9.3%) turnout is absolutely astounding to me, and might suggest fairly significant challenges with connecting with voters in certain areas (especially if they can't see the river, apparently...).

Voter turnout can also be measured in a few other ways, including attrition along the ballot. For instance, of everyone who voted, 1.5% neglected to vote for a mayoral candidate, and 1.9% neglected to vote for any council candidate. 26.3% of voters picked a Catholic schools ballot vs. 66.6% Public ballots, and even then 6.5% of Catholics and 9.8% of Publics didn't end up voting for a school trustee anyway. Oddly enough, the total number of Catholic + Public voters doesn't equal the total number of voters, so I'm not entirely sure where the remaining 7.1% of voters did for school board...

Lighter colours represent 'under votes', or people who didn't make a pick for that particular round of voting.

Don Iveson was re-elected mayor with a solid victory. His support levels in Edmonton aren't dissimilar from last election, and are shown here (darker colours meaning higher support).

Iveson's support in general seems very solid in the center of the city, and a bit weaker in the north and southeast than the rest of the city. All that being said, his support ranged from 59.5-85.9% so he has a strong mandate from every part of the city.

Finally, similar to last election, I've taken a look at which councillors' support correlates most or least with the mayor's. Last year, it turned out that a general pattern emerged where the councillors whose support most often correlated with high mayoral support also generally agreed with the mayor on votes. This year, the correlations between councillors and the mayor are:

I'd say this supports the theory from last election - last term, McKeen, Esslinger, Knack, Walters, and Henderson all voted alongside the mayor on more than 80% of non-unanimous votes, while Banga, Caterina, and Nickel (76%, 75%, and 46%, respectively) agreed with the mayor less frequently. While the mayor has had a strong track record of gaining majority support for non-unanimous bills, it does seem as though the candidates who do better in polls where the mayor does worse to tend on average to disagree with him more often than not.

That suggests that perhaps this council will be a little bit closer in voting record than the last one - the four new councillors all showed up in the middle of the pack for mayoral correlations, so likely either they are wildcards for agreement with the mayor, or as new candidates their reputation hasn't yet been tested. Only time will tell!

Edmonton City Council Gender Parity

2017-06-26T16:05:00.001-06:00

Back in October I took a quick look at the success rates of female candidates getting into city council. In 2013, 22% of candidates were female, but only one out of the twelve council seats ended up being held by a woman. The aim of that post was to investigate some of the source of the gender disparity on council - namely whether the distribution of female candidates in different races was causing the issue, or whether there was an inherent bias against female candidates.

Ultimately, I determined that the relative lack of successful female council winners was more likely due to distribution of candidates across races than individual bias - without accounting for incumbency, there was no evidence of anything other than relative equal chances of winning between female and male candidates (i.e the number of female winners since 2004 is more or less what you'd expect assuming all candidates are equally likely to win).

That was a pretty positive sign, as it suggests that the biggest factor holding back a demographically-balanced council is the availability of under-represented candidates to run (which is totally outside of the scope of this blog to discuss), and perhaps more importantly, the avoidance of clumping of under-represented demographics into the same few races.

One of the biggest issues with the 2013 election was that five wards had no women running at all, and half of all women were clustered into two ridings. This drastically reduced the expected number of women into council, regardless of the relative proportion of candidates who put their names forward.

So with all that said, I've been keeping track of candidates for the 2017 civic election which are being tracked at Daveberta. For each candidate, I've tried to ascertain their gender in order by how they refer to themselves (political candidates love speaking in the third person), or how they're referred to in third party posts, and if all else fails by name and presentation assumptions. If you notice any errors, please let me know.

(Last updated September 19, 2017)

Based on the current 71 candidates, 23 are female and 48 are male (female ratio of 32.4%, up from 22% in 2013). However, based on the distribution between wards, an expected 3.89 seats will be won by female candidates, which could be considered a relatively inefficient allocation of seats based on the ratio of candidates. 2 wards have no women running at all.

Overall, it's most likely that the number of female councillors after the election will be between 2 and 6 (90% confidence).

Edmonton Council (32% female candidates)

Edmonton Catholic School Board (65% female candidates)

Edmonton Public School Board (39% female candidates)

Now that the official nomination deadline has passed, these numbers ought to be pretty official! All in all, women running for city council are still a bit poorly distributed, leading to an expected under-representation of about 0.15 seats. On the other hand, men tend to be poorly distributed in the school board races, leading to expected over-representations of 0.28 and 0.86 seats for Catholic and Public boards respectively. All in all, the candidate distributions are fairly balanced though, and this is certainly a fairer election gender-wise than 2013.

Next Game Wins?

2017-05-09T07:55:00.000-06:00

(Subtitle: Which Game Should You Win? Part 3)

Three years ago, my friend Andrew pitched in to the blog and asked which game in the playoffs was most worth winning. The results were a bit inconclusive, but from it he developed a database of all playoff outcomes since 1943, so a year later I looked at the dataset again and developed Markov-style chains of playoff odds based on different positions in the playoffs.

Now that it's playoff season again, people are naturally interested more than normal in hockey and I recently overheard someone comment that, though a series was currently at 2-1 for wins, the next team to win was undoubtedly going to win the series.

Good thing I have this handy database of all playoff outcomes ready, because that immediately intrigued me as to how likely it actually is that, at any given point, the next team to win a game will win the series overall. This is perhaps another way of asking the same question as before - how much does this upcoming playoff game matter to the grand scheme of things?

Before looking into the historical data, though, it's worth doing the math to see what the odds would be if the human element were removed (with all games having a 50/50 chance of going either way, and all games being independent). Obviously, if a best-of-seven playoff series is tied at 3-3, then the next game winner is guaranteed to win the series, so that's an easy starting point.

From there, it's not too hard to work backwards to figure out the rest of the odds. If a series is at 3-2, then there's a 50% chance that the leading team wins (which would give them the series win, and a 100% chance therefore of winning the series), and a 50% chance that we get to a 3-3 position, where the chance of the trailing team being the overall winner is again 50%. Overall, that makes the chance that the next game winner will be the series winner (50%*100%)+(50%*50%)=75%.

If we continue this way, then we can generate this table of values. For all following graphs, the 'home team' is the team that has home town advantage for the first two games:

So what's not surprising here is that the odds that the next team to win will be the series winner are always above 50%. That makes sense, because no matter what the position is beforehand the winner is improving their overall odds of winning the series. What's more interesting is how little games tend to matter when the series is lopsided.

Of course, games aren't all independent or aren't all 50/50 toss-ups. Historically, home teams win 54.5% of games, so let's see what happens if we recreate this table with that factored in. It's a bit more complicated, but essentially the same analysis as before, to get this table:

Here we start to see the effects of the playoff structure and the pattern with which it allocates home games to different teams. For instance, when the original home team is up 3-0, the upcoming game almost doesn't matter at all, but the situation isn't quite the same if the original away team is up 3-0. Similarly, both 3-2 game situations have different values. This can be perhaps more easily rationalized - if the original home team is up 3-2, then the upcoming game is going to be in their opponent's home town, which makes it more likely that that other team will win, but if they do then it's tied coming back home, so that's less of a big deal. On the other hand, if the original away team is leading 3-2, they're more likely to win this upcoming game 6, and can lock the series up right there.

Of course, this is all fun and games from a theoretical point of view, but what's actually been happening in real playoff series? Here we go:

This is definitely more interesting! Here we have a clear outlier from the theoretical projections from before, where the 'least important' game is game 5 when the original away team is up 3-1. At this point, the original home team would be playing back at home, but would be down by such a significant deficit, resulting in a situation where they end up with a fairly high 'last hurrah' win rate, before ultimately losing the series 2-4.

On the other hand, there's a surprisingly high predictive score for whoever wins the game after the original home team gets up 1-0, at 74% (8% higher than what you'd expect in a coin toss scenario). I imagine this indicates that the original home team is likely to win their first game, and that if the original away team can't bounce back then the series is likely sorted out by that point (at least, in harder-to-quantify matters than you'd expect).

So the answer to the question 'which playoff game is the most important' remains a solid "it depends", but now you have three different ways of looking at the question. Use them wisely, and enjoy the 2017 playoffs!

How often does the best team win the championship?

2017-02-06T11:36:00.000-07:00

Imagine we have a four team single-elimination tournament. Team A is good enough that you'd expect them to win about 80% of all games, Team B ought to win about 60%, Team C should win 40%, and Team D ought to win about 20% of games (against random opponents). If we seeded a single elimination tournament with these teams, it could look something like this:

Given the information above, how likely is it that the best team in this tournament, Team A, ends up being the winner? In other words, how effective is this tournament structure and seeding system at determining the best team out of the four?

The first tool we'll need for this is the Log5 formula - given the true winning percentages of two teams, this formula tells you the odds of a given team winning. So for instance, Team A playing Team D is pretty lopsided, and Team A has a 94.18% chance of winning that game based on this formula. Similarly, Team B and Team C are much closer in relative skill, so Team B only has a 69.23% chance of winning that game.

Based on all of this, we can come up with the relative chances of any of the four teams winning the tournament overall:

So in our hypothetical situation, this single-elimination four-player seeded tournament with known team skillsets resulted in the best team, Team A, winning 72.26% of the time, and the worst team winning 1.06% of the time. Not shabby.

However, what if you didn't know the skill levels of the teams going into the tournament? How confident could you be that the eventual winner of the tournament was, in fact, the best team when they signed up? One way to determine this is by running a Monte Carlo simulation - let's sign up four teams of random skill values, run them through a tournament exactly the same way as we just did with our sample teams, and see who the winner is. Then let's do that 10,000 times, and see how often the best team wins.

The results are interesting: with randomly drawn skill values for all teams (mean= 0.5, stdev=0.13 [see below**]), we'd expect the winner of the tournament to be the strongest team only 44.3% of the time. About 10.5% of the time, the weakest team in the tournament would end up winning the whole thing!

So is a single elimination tournament a particularly good way of determining the best overall team from a pool of four teams? Probably not. What happens if we up the ante, and have a double elimination tournament? Double elimination tournaments are exactly what they sound like - any given team needs to lose twice before being eliminated. They tend to look something like this:

This sort of format ought to improve the chances of the best team winning, as a single unlucky (and unlikely) loss won't eliminate them too early. If we run the same sort of analysis, but with a double elimination tournament, we end up with the winner being the strongest team 51.1% of the time, and the winner being the weakest team only 7.7% of the time. A reasonable improvement all in all.

Unfortunately, this modest increase in chances of determining the truly best team is offset by the increased length and uncertainty in the tournament. A single elimination tournament needs three games total, whereas a double elimination tournament needs either six or seven games, depending on how the sixth one goes. This is also annoying to schedule and sell tickets for, as organizers have no way of knowing if the sixth game will be the exciting final or not.

Expanding a bit, we can do the same analysis with single, double, and triple (!) elimination format tournaments for tournament of as many teams as we want. Before we continue, though, I'll mention that for a 32 team single-elimination randomly-seeded tournament, the odds that the winner will have been the best team overall are 22.0%. Keep that in mind.

So why talk about these tiny little tournaments? It's to get you ready for the real deal: professional sports.

Professional sports leagues generally tend to fall into a regular season and playoffs, where the regular season is used to seed teams into some sort of order and filter out the best, who end up playing in a tournament bracket of one style or another. What happens if we do the same sort of analysis for each major (and some minor) sports league?

National Basketball Association

Overview: 30 teams play 82 games each in the regular season. Teams are sorted into conferences and divisions. The playoffs are a 16-team single elimination tournament bracket, where each round of the playoffs consists of a best-of-seven games series for elimination. Teams are seeded within conferences, and the bracket is fixed at the end of the regular season. This gives us:

Overall odds of the NBA Championship being won by the season's best team: 45.9%.

Pros: Best-of-seven series in playoffs reduces variability in results. Not seeding teams based on division standings, and only looking at conference standings, reduces the chances of weaker teams getting into the playoffs by virtue of leading weaker divisions.

Cons: Relatively long season for regular season, large range in potential length of playoffs.

National Hockey League

Overview: 30 teams play 82 games each in the regular season. Teams are sorted into conferences and divisions, and are more likely to play teams inside their divisions and conferences than outside of them. The playoffs are a 16-team single elimination tournament bracket, where each round of the playoffs consists of a best-of-seven games series for elimination. Teams are seeded within divisions such that the top three teams of each division are guaranteed a spot in the playoffs, and the highest-performing two teams of the conference that remain get in as wildcards. All of this results in the following distribution:

Overall odds the Stanley Cup winner was the season's best team: 45.4%.

Pros: Best-of-seven elimination in playoffs reduces variance and increases the chances of better teams triumphing.

Cons: Relatively long season for regular season, large range in potential length of playoffs.

Fun fact: Only real difference between NHL and NBA results is the seeding into the playoffs and how wildcards are handled, and that results in hardly any change at all.

Major League Baseball

Overview: 30 teams play 162 games each in the regular season. Teams are sorted into leagues and divisions, are are more likely to play teams inside their divisions and leagues than outside of them. The playoffs consist of the winners of each division and the two wildcard runners-up from the conference, and is a wonky sort of 10-team single elimination tournament where the first round are two single game wild-card playoffs, followed by four best-of-five division series, then two best-of-seven league winner series. Finally, the winners of each league play each other in a best-of-seven series to determine the winner. From this, we get:

Overall odds the World Series winner was the season's best team: 45.4%.

Pros: Teams are more likely to be correctly seeded heading into playoffs due to extensive regular season. Fewer teams in playoffs makes it less likely for weaker teams to get lucky.

Cons: Short playoff season with sudden-death games and best-of-five series increases variance.

Fun fact: If all rounds (including wildcard) in the MLB playoffs were best-of-seven series, the odds that the winner was actually the best team increase to 46.4%.

National Football League

Overview: 32 teams play 16 games each in the regular season. Teams are sorted into conferences and divisions, and are more likely to play teams inside their divisions and conferences than outside of them. The playoffs are a true 12-team single-elimination tournament, where each division leader and two runner-up wildcards are seeded from each conference. This results in:

Overall odds the Superbowl winner was the season's best team: 28.2%.

Pros: Short, fixed playoff schedule is predictable.

Cons: Single elimination sudden death games greatly increase the chances of weaker teams winning by chance and eliminating stronger teams. Relatively short season also doesn't guarantee accurate seeding of teams heading into playoffs.

Fun fact: Before, I mentioned that the chance of the winner of a randomly-seeded 32-team single-elimination tournament being the best team was 22.0%. That means the NFL season format isn't really all that much better than just having one six-week March Madness style showdown each season.

Some more minor tournaments that are still near and dear to my heart:

Canadian Football League

Overview: Nine teams play 18 games each in the regular season. Teams are sorted into two divisions. The playoffs are a single-elimination 5-team tournament, where the highest-ranked team in each division gets a bye to the division finals. Teams can cross-over into other divisions if the fourth-place team in one division has more points at the end of the regular season than the third-place team in the other division. This gives us:

Overall odds the Grey Cup will go to the season's best team: 38.0%.

Pros: Short, fixed playoff season is predictable. Still somehow have a longer regular season than the NFL. Higher ranked teams getting a bye to the division finals helps out the stronger teams.

Cons: Single elimination playoff format, which includes more than half the league, leads to a bit of a crapshoot. It's disappointing than the NHL with over four times as many teams is better suited to finding its best team each year.

Fun fact: Again, this isn't substantially better than just running a 9-team randomly-seeded single elimination tournament right at the beginning of the year. It's more fun, though.

Curling

Overview: Major curling tournaments involve 12 teams, who play each other once each in a round robin. The top four teams are seeded into a Page playoff system, where the top two teams are in quasi-double elimination system, and the remaining teams are in single elimination. This gives us:

Overall odds the winner was the tournament's best team: 37.3%.

Pros: Fixed playoff system is short and predictable. Page playoff system gives a bonus to the teams who perform best after a fair and balanced round-robin.

Cons: Single elimination format of playoffs increases variability.

Fun fact: Curling is fun and you should try it.

So there you go. Unsurprisingly, sports leagues with longer regular seasons and best-of-seven playoff series are better suited for determining the actual best teams each season, whereas leagues with shorter seasons and single elimination tournaments as less well-suited. Now you have numbers to show for it, at least!

Redistricting in Alberta

2017-02-02T16:02:00.002-07:00

Every eight to ten years, the Alberta Electoral Boundary Commission meets up to reconfigure the riding boundaries for upcoming elections. This is a fairly important part of our democracy, as cities are often growing, people tend to move around a lot, and keeping our boundaries updated to reflect current population distributions is a good way to ensure that everyone is equally represented in our legislature.

Fortunately enough, Alberta isn't like many US states where it is elected politicians themselves who decide where these boundaries will be drawn, which can tend to lead to gerrymandering as I've discussed before. So although in Alberta districts are determined by a neutral committee and appear to avoid obvious signs of manipulation for political purposes, they aren't really all that good at making sure everyone's vote counts the same no matter where they live. For instance, the Electoral Boundaries Commission Act specifies that the maximum population deviation from the average per riding is 25%. As well, provided the area is sparsely populated enough, up to 4 districts can have populations that are as much as 50% below the average population of a riding in the province.

This led to a situation where, based on the 2011 census, the largest district had a population of 51,800 people, more than twice the size of the smallest district at 23,050. Someone living in Dunvegan-Central Peace-Notley has nearly twice the voting power of the average Albertan when it comes to provincial elections, at a population a whopping 45% below provincial average. (As a side note, this still isn't as bad as on the federal stage, where Labrador is 73% below average, and five times less than the highest populated riding in Brantford-Brant, but that's a different story.)

So, as an infomercial might say at this point, "There must be a better way!"

The Electoral Boundaries Commission is accepting submissions now while they begin their redistricting process, and this seems as good a time as any to determine a better solution. What is a fair way to split the province up into 87 sections each with the same population?

One of the coolest solutions is to use the shortest splitline algorithm. As explained by CGP Grey, the shortest splitline algorithm is a repetitive process that searches for the shortest line that splits an area perfectly in two by population. Each half is then split again with the shortest line that produces equal halves, until ultimately we stop when we've gotten the desired number of sections split up, which are necessarily of exactly even population.

So lets try this for Alberta. The first thing we need is a population distribution of Alberta, which Statistics Canada helpfully has lying around on their website. It looks like this:

This is Alberta broken up into 5,711 census dissemination areas based on the 2011 census.

Next up, we would normally find the shortest line that crosses Alberta in such a way that exactly half of the population is above the line, and exactly half is below the line. Since Alberta has 87 districts, though, we actually want to find the shortest line that has 44/87 of the population above it, and 43/87 of the population below it. In my (slightly optimized) model, that looked like this:

Then we split each half again. The top half is an even number, so we can split it in two easily, whereas the bottom has to be split into 22/44 and 21/44 segments. That gives us this:

And so on and so forth until we've split all of Alberta up into equal segments. The final result of this ends up being this lovely stained glass window:

Of course, things can't always be perfect no matter how hard you try, so this is a solution for Alberta that has a maximum population in each riding of 0.38%. The largest riding has 42,052 people in it, and the smallest has 41,752. This is a solution to split up Alberta that has a maximum voter variance that is 118 times smaller than we have now, and a coefficient of variation that is 68 times smaller.

Also, just because the map was drawn with straight lines doesn't mean it has to stay like that. If we go back to our census dissemination area shapes from Stats Canada, we can convert an Edmonton distribution from this:

To this:

Which is actually starting to look pretty reasonable. Neighborhoods are kept together, and the areas are looking relatively compact.

The shortest splitline method is an objective and fair way to distribute votes such that everyone's votes are counted equally. I was able to redistrict all of Alberta using Excel - no fancy programming skills are needed. There's no reason that we can't have redistricting being as boring as updating census data and having a computer spit out a single solution each time we need it.

That being said, there are still some objections people could have with it - for instance, it doesn't necessarily give a hoot about municipal boundaries. Take Red Deer for example: after applying the algorithm to Alberta, Red Deer got sort of unfortunately split into four districts, each of which includes substantial amounts of surrounding countryside:

Oh no.

This would have people in Innisfail voting alongside southwest Red Deer, and people in Saskatchewan River Crossing voting alongside west Red Deer. That's probably a bit messed up. In cases like this, I think it's probably fair to use the splitline algorithm to get to a starting point, and then massaging the districts as needed to make sure they make a bit more sense. Because the algorithm got within a 0.38% maximum population variance to start with, it is likely quite straightforward after to swap around some areas as needed to keep the popluation variance still small. For instance, Public Interest Alberta recommends a maximum population deviation of 5%, which I'd suggest is reasonably easy to achieve if we're starting from a point where our deviation is essentially 0%.

So if I've convinced you that using algorithms to redistrict our population can lead to fairer, objective, even distributions of our political districts, and that those are things worth having in our democracy, head on over to the Commission's website and leave them a submission before February 8th!

Edmonton City Council Gender Equality

2016-10-27T15:46:00.001-06:00

In the 2013 civic election, 79 candidates ran for mayor and city council, 17 of which were women. The election resulted in one woman getting elected out of a council of 13. Though women represent 51% of the city's population, they represented only 22% of candidates, and resulted in 8% of council seats. With results like this, it's little surprise that groups like Equal Voice are calling for improvements to our system, including encouraging a larger diversity of candidates and promoting a more balanced and representative city government.

Taking a deeper look at these results shows some interesting trends though. For instance, in 5 wards in 2013, there were no women running at all, and half of all female candidates were clustered in races in two highly contested wards. This suggests that, while 22% of all candidates were female, the distribution of female candidates may have already been predisposed to a lower number of female winners in the end. Let's take a look.

There were 12 wards and one mayoral race in 2013 for city council, and the proportion of female candidates per race ranged from 0-43%. Assuming any given candidate has an equal chance of winning any given race (an assumption we'll check later), this is the expected distribution of female winners:

As previously mentioned, there was absolutely no chance of there being anywhere from 9-13 women on council, as 5 races were contested solely by men. Based on the uneven distribution of candidates in the remaining races, there was an expected 2.01 female councillors last year, or 15% of council. So while the number of women on council was still less than expected, it was closer than what we might have expected based on the total number of female candidates. Instead of being 14% lower than what we might expect from candidate distribution, we were 8% lower.

So does this mean that female candidates are 8% less likely to get elected than male candidates? That's really hard to say, and it turns out we don't have enough data yet. One way we can check is by looking at the p-value of our outcome - what's the chance that we could have gotten something as bad as the result we did, assuming our null hypothesis (that women are as likely to get elected as men) is true? In this case, the p-value is 0.37. Essentially, our data set is small enough that any result between 0-4 female councillors wouldn't have been all that much of a surprise (and in fact, 6-7 would have been an indication of an opposite effect). So let's not worry about significance yet, and instead look at more elections!

Edmonton's civic elections elect people to mayor, council, and public and catholic school boards. If we do the same analysis for all three councils for the last four elections, we get a chart that looks like this:

This suggests a lot of things, including:

City Council results over the last 4 elections haven't shown more than a 10% deviation one way or another. More importantly, the p-values for each election have been totally reasonable.
There's a lot of variation in the Public School Board elections. This is partially explainable based on how small the Board is, so any variation will be magnified from a percentage basis. On the other hand, that level of variation isn't present in the (smaller) Catholic Board...
Catholic School Board elections haven't shown an anti-woman bias in this data set.

So, interestingly enough, of the 12 discrete votes that I looked at, 4 had a slight anti-woman bias, and 8 were perhaps slightly pro-women. Essentially, what this suggests is that female candidates are just as likely to get elected as male candidates. If we add up all the results since 2014 into one graph (of 402 candidates running for 114* positions over the last four years), we get the following distribution:

Overall, 46 women have been elected to 114* electable seats, where the candidate distribution and chance would expect us to have elected 42.5. The p-value assuming an equal electability between women and men is 0.22, so no, meninists, there isn't a pro-woman bias either. These results are pretty much what we'd expect given the candidate distribution we've had. This general conclusion holds true across city council (p=0.29):

And Public Schools (p=0.19):

Though intriguingly enough breaks down at Catholic Schools* (p=0.03):

*: Here it's worth mentioning that before 2010, the Catholic election system was really weird and had a wildcard winner from whoever had the most votes but wasn't elected in their ward. This was particularly silly seeing as not all wards were the same size, so I've ignored the wildcard seat and victor for the purposes of this analysis.

The fortunate summary of all of this is that there's no evidence that any system is rigged against female candidates. That being said, the proportion of women elected to civic office in the last years is just under a third of total offices filled, which isn't even remotely balanced. The best way to get a more representative council is to have more under-represented demographics put forward their candidacy, so if you know anyone who might be interested or qualified (of whichever underrepresented group you choose), I strongly encourage you to encourage them to run.

Edmonton Bike Safety

2016-08-17T10:00:00.001-06:00

Bicycles in Edmonton have been in the news quite a bit recently, particularly given the success of new bicycle development in Calgary. Bicycle lanes in Edmonton have been proposed, installed, removed, illegally painted, removed again, and blocked in council quite a bit in the last few years. Frustrations between cyclists, city planners, and drivers have gotten to a boiling point recently, and I think it's safe to say that whichever side of the debate you're on you're likely sick of it all. But please keep reading!

With all that said, things have recently gotten a bit more interesting from a data point of view. A month ago I was made aware of a data set of cycling injuries and incidents from 2009-2014 from the nice folks at Spacing Edmonton, which were analyzed by them as well as (more recently) the group over at Slow Streets.

Specifically, the people at Slow Streets made the claim that injury hot spots indicate where more cyclists are travelling, showing cyclist 'desire lines' which would be prime targets for bicycle infrastructure. However, a quick look at the map suggests that the streets with supposedly lots of bicycle traffic are also the roads with lots of vehicle traffic. Hypothetically, even if all streets had the same bicycle traffic, we might expect a similar distribution since one might think that more cars might lead to more interactions with cars.

So let's take a look and check this hypothesis. Fortunately, Edmonton has a nice map of average annual weekday traffic for major roadways. I combined the map data of all 1,070 cyclist injuries from 2009-2014 with the map of all streets that had traffic volume stats, and ended up with this result:

Error bars represent the 95% confidence interval for injury rate.

It looks as though there is a decisive link between vehicular traffic on a road and the number of cyclist injuries. As the city doesn't seem to have any specific information on bicycle ride distributions, it's hard to say with any certainty if the Slow Streets analysis is correct. Either way, it's clear that wherever cyclists mix with lots of cars, we get lots of injuries. This analysis ended up looking at 571 km of major roads with traffic data, which were responsible for 760 of the injuries recorded from 2009-2014.

But hey, that's not all! Edmonton also has a map of everyone's favorite (or least favorite) things - bike lanes!

From the map, Edmonton's road bike-friendliness can be broken down into four different types. There are separated shared use pathways, painted on-road bike lanes, signed on-road bike lanes, and plain old normal streets. So what does my previous analysis look like if we split road types up by their bicycle infrastructure? Why, this:

Again, error bars are the 95% confidence interval. Basically, ignore the green bars...

What are some takeaways from this? Well, first of all, major roads very infrequently have signed on-road bike lanes, so there's far too much variability for a proper analysis of them (green on the graph). Far more common are roads with separated shared bike paths (red), or no infrastructure at all (grey). From this, we get the firm (and hopefully not unreasonable at all) conclusion that biking on separated, wide, shared pathways for bikes is safer than biking on a normal road with traffic, by a factor of about 2.

From the City of Edmonton bike map

However, an interesting conclusion from this is that it's extremely hard to make the argument that painted bike lanes are safer than normal roads. In fact, in some cases, it looks quite a bit safer to bike on non-bike-laned roads. Weird.

What might cause this? Well, first of all I'd say that this analysis is a few factors short of anything scientific. For instance, the bike lane map for Edmonton likely includes lanes and paths that haven't existed for the entirety of 2009-2014, or have since been removed, so some of the injuries from my analysis are likely classified inaccurately. As well, other researchers, when investigating bike lane safety, controlled for the presence of parked cars on the side of the road, which I did not. So while I wouldn't necessarily go so far as to say that my analysis shows Edmonton bike lanes are more dangerous than streets without bike lanes, I stand by the assertion that bike lanes aren't safer than streets without them. I embrace the subtlety of that distinction.

Regardless, the data is quite clear about the effects of vehicle traffic on bike incidents, and the effects of physically separating bike paths from roads. Namely, separating vehicle and bicycle traffic may reduce bicycle injuries by a factor of 2 on busy roads, and up to a factor of 6 on quieter roads.

Again this is not surprising at all - I can't stress just how intuitive and likely boring the main finding here is. But this data set of cycling injuries from 2009-2014 does seem to show that painted bike lanes have not had the effect that was perhaps intended.

In my opinion, having decent bicycle infrastructure is absolutely important to having a vibrant and healthy city. Hopefully future bike lane decisions are made keeping injury prevention and statistics in mind, in such a way that we can expand our biking infrastructure as effectively as possible.

Edmonton City Council Votes (Part 2)

2016-07-03T20:10:00.002-06:00

A year ago I did a short piece looking at Edmonton city council voting patterns. It was pretty fun and showed some cool blocks in city council, but since then we've had a monster by-election, so it seemed like now is a good time to take a second look at this analysis.

Since council as a whole got elected in 2013, there have been 5763 votes performed, according to the city's Open Data catalogue. Of course, many of these are procedural matters, and the vast majority of them are unanimous. If we restrict the votes to non-unanimous votes to see how the councillors interact, we're actually only left with 358 votes to look at.

Of those 358 votes, we can come up with this result, showing how often each member of council agreed with each other member of council. I've colour-coded it to make the numbers seem a little less daunting:

The major update here, of course, is the addition of Councillor Banga to the mix. He seems to generally follow the Iveson/Esslinger/Walters group that we identified last year, though generally less so than his predecessor Amarjeet Sohi did. He also seems to disagree with Councillor Caterina disproportionately relative to anyone else. Again, much like a year ago, Councillor Nickel is a bit of an outsider, who agrees with his colleagues far less than anyone else does.

Another way of looking at this is to make network graphs. This first one shows all connections with councillors that agree with each other at least 67% of the time (this number was chosen so that Councillor Nickel isn't left out). Feel free to play with it, it's rather fun!

Alternatively, we can generate a network graph based on who agrees with who the most frequently. Orange arrows (when you hover over them) indicate the most frequent agreements for each councillor, blue arrows indicate that another councillor most frequently agrees with the first, but that it isn't reciprocated.

This shows a bit more clearly how potential groupings look at city council. Five councillors agree with Mayor Iveson more than anyone else, and two other councillors most frequently agree with two of those five. On the other hand, the remaining 5 other councillors tend to spread out from Councillor Caterina.

Of course, these two groups aren't all that different - Councillor Caterina and Mayor Iveson still vote the same on 75% of contested motions, so realistically they agree 98% of the time on all motions, but the above network graph is a nice way to dramatize it!

Finally, we can also take a look at how often each member of council ends up getting the result they voted for on each motion. Again, only looking at non-unanimous votes, we have:

Impressively, Mayor Iveson ends up on the winning side of a council vote 95% of the time. In fact, of all 5763 votes performed since 2013, Mayor Iveson has only been disappointed 17 times. There are certainly many conclusions that can be drawn from that, but at the very least nobody can say that Don Iveson has difficulties instituting the agenda he wants on council.

So there you go. I plan to do another analysis like this before the next election, so stay tuned for that one!

Which Edmonton City Councillor are You?

2016-04-26T11:27:00.000-06:00

Since the Ward 12 by-election just a few months ago, Edmonton city council has gotten into quite a few rather contentious votes. Most recently the Mezzo Building decision left quite a few observers rather upset, but earlier council decided to scrap the proposed Hawrelak Park Water Play Feature (worst name ever, by the way) after being faced with price increases, and has had to face some struggles with the proposed green development in the Blatchford area.

With that all being said, since Councillor Banga has taken on the role, Edmonton's open data suggests that there have been 25 votes of council that have been non-unanimous, which it turns out is more than enough that no two councillors have voted the same way on everything over the last two months (even though Councillor Oshry and Mayor Iveson gave it their best shot at 24/25). That means that, with only a few questions, we can generate a choose-your-own-adventure game in the style of a Buzzfeed quiz to see which councillor you agree the most with over the last term!

Which Edmonton City councillor are you? The answer will surprise you!

Edmonton Zone Map

2016-04-22T09:49:00.002-06:00

Earlier this week, there was a bit of a kerfuffle raised at City Council when they contentiously passed a motion to allow a new 16-storey building near Whyte Avenue. In order to allow the new building, they had to change some of the zoning around the area.

I was curious about exactly what the distribution of zones in Edmonton look like, so I decided to see if I could find a map. Oddly enough, despite the data being available on the city's OpenData portal, there wasn't a readily-available one to be found via Google.

And maybe there's a good reason - it turns out there are over 85 different zone descriptors that the city uses, and many of the individually set zones are actually rather tiny (small parks count as their own zone, for instance). If you coloured a map based on all the different types of zones, it would be a scary kaleidoscope that wouldn't be terribly useful.

So instead, I've reverted to the tried and true Sim City method and labelled things broadly as either Residential, Commercial, or Industrial zones. Take a look:

If you've lived in Edmonton for more than a couple minutes, I'm sure that this map isn't surprising to you at all. I find it still cool to actually see things laid out like this though - it really shows you the industrial moat that surrounds Mill Woods, for instance, and specifically locates all of the strip malls we seem so fond of. (If your favorite strip mall isn't coded blue, it's most likely because many areas tend to end up as 'Site Specific Development Control Provision', which is essentially bylaw code for 'none of the above'. I didn't end up colour coding them all because there 650 of them, mostly all for different reasons...)

One final thought: I'm not so sure I like the sounds of the Anthony Henday being an agricultural zone. Hopefully they keep the agriculture and the four lanes of speeding gas guzzlers a little separated...

Gender Equality in APEGA

2016-03-08T11:51:00.004-07:00

Pop quiz: Given that recent data suggests that women earn 72% of what men do for similar work in Canada, what is the wage gap for women in engineering in Alberta?

A) 13%
B) 0.15%
C) The math depends on your agenda

You're right! The math behind the wage gap depends on what you're looking to achieve in your analysis. Congratulations!

Let me explain. For the last couple of years, APEGA has published a detailed salary survey of its members. (This year, APEGA instead published an 8-page summary of the survey, and asked $1,900 to share the full information with you, while also withholding the complete data from previous years. Yuck.) Fortunately, through the power of the web archive, we can access the previous salary survey data, which is helpfully broken down into many demographics. Let's take a look.

From the 2014 salary survey, the average male engineer's salary was $125,721, and the average female engineer's salary was $109,402, for a wage gap of 13%. Alright, we're done here. That was easy.

Well, maybe not so fast. One of the biggest determinants of salary is seniority, and if seniority isn't distributed similarly between genders then that may skew the data. If we compare male to female earnings based on seniority, we get:

For salary survey purposes, an A- class would be the equivalent of a co-op student, and an F+ class would be senior management.

When we look at the data like this, we see that until maybe the very top levels of senior management, male and female engineers make approximately the same salaries (within 2% one way or another). If we weight this based on the total number of engineers in each category, we actually end up with females earning 0.15% more than men on average.

So on the one hand we have women earning the same as men, and on the other hand we have women making 13% less, all depending on how you look at the statistics. While things are looking good from the point of view of co-workers getting paid similarly for similar responsibilities, is there a chance that something else may be pulling back on womens' chances at the better paying jobs? We can investigate further by looking at seniority by gender:

Alright now that's something. Women tend to average around a B to a C level, whereas men tend to average around a C to a D level. Here's a major difference, and when compared with the salary averages at each seniority level, we can see where the previously-established 13% salary difference comes from.

Based on the earlier analysis, I'm pretty optimistic that for the same responsibility level, male and female engineers make approximately the same wages. But it's definitely worth looking into what's causing the differences in distribution of work responsibilities.

Starting out, I think there are three major plausible theories. A pessimistic and sexist theory could be that men are promoted faster in the workplace, and as a result tend to sit higher in seniority (glass ceiling style). The disappointing yet potentially less sexist theory is that women, for one reason or another, leave the workplace earlier than men, and as a result there are fewer of them to take on senior management roles. And the last theory is that changes in the graduation rates of female engineers are leaving women just now catching up to men in equality.

Let's examine each in turn. Each level of seniority in the APEGA salary survey also contained information on length of career post-graduation, for women and for men. If the distribution of these values doesn't line up, perhaps that tells us something. The three largest groups by seniority are B, C, and D:

Well that's bang on, how about:

Still pretty close. Then there's:

Alright, they actually all look reasonably similar. If anything, there may be a higher percentage of younger women in D-level positions than men, similar to the higher percentage of very young men in C-level positions than women. Nothing that could quite explain a 13% wage disparity though.

The second theory I suggested was that women may leave the workplace at younger ages than men, for various reasons. Here's the distribution of women at different stages in engineers' careers:

Yikes. Please note though that the salary data for people who've worked 35-40 years is pretty slim, so it's not terribly unlikely that there actually are some women engineers in that demographic, and the 2014 salary survey actually over-polled the number of females which may also skew the data. Either way, we see a clear trend where older and more senior engineers are substantially less likely to be female than younger engineers.

The final piece of the puzzle comes from the third theory I listed above. Graduation rates for female engineers have changed wildly over the last 40 years, as shown in this graph from the Ontario Network of Women in Engineering:

For a wide variety of reasons, the proportion of engineers graduating in 1975 who were female was only 3.6%. This climbed to approximately 20% only in the late 1990s, where it has fluctuated a bit since then. The last graph I present to you is the comparison of graduated female engineers since graduation and the percentage of female engineers in APEGA over the same time range:

While there may be a bit of a discrepancy between Canada graduation numbers and Alberta employment numbers, I think this comparison is still valid.

So what are we left with to explain the wage gap for engineers in Alberta? It appears as though a significant part of it may be due to the fact that, until relatively recently, the rate of women entering engineering education was dreadfully low. A lot of the high-paying senior management positions that are held by men simply don't have many women counterparts to be offered to, leading to an imbalance in seniority. That being said, women in engineering, certainly past the 20-years-since-graduation mark, are still lagging behind their graduation rates, suggesting that women who did graduate over 20 years ago were still more likely to leave the field than their male counterparts.

Where does this leave us? Well, while things are definitely getting better, and engineering is surprisingly better than the average of other workplaces, there's always work to be done. I suspect that as the workforce ages, we'll see a narrowing of the disparity in seniority, and hopefully in the meantime we can figure out which factors lead to women leaving the field disproportionately. Only when we reach a situation where opportunities at all levels of engineering employment are equal will we have a truly equal environment for engineers in Alberta.

Edit: It's worth noting that the APEGA salary survey does not distinguish between full time and part time, or contract or non-contract work. As a result, any potential gender disparities between these forms of employment haven't been assessed in this post, or in the APEGA salary survey as a source.

Edmonton 2015 Federal Election Results

2016-03-01T09:47:00.001-07:00

As you might remember, that big old scary 2015 Federal Election happened way back in October. However, the Government only released the official results of the election just yesterday - go take a look at them, they're quite cool!

In the meantime, here's the final official map of how each poll in Edmonton voted, in a similar format to my post regarding the previous Alberta election. Each poll is coloured by which party had the most votes there (red is Liberal, blue conservative, and orange NDP), and then darker colours indicate the party had greater than 50% support.

Enjoy!

Edmonton Ward 12 By-Election Results

2016-02-24T13:26:00.002-07:00

Edmonton's Ward 12 by-election took place this week, and it had a historic 32 candidates running to sit on council. There were a couple of fun outcomes from this election that are worth taking a quick look at:

Winning Vote Total

I didn't bother doing a full write-up on this, but I was curious at the beginning of the race about just how many votes it would require to win this election. With 32 candidates, technically the winner could have won with 3.125% +1 votes, which is terribly low, but I had a hunch that there would be a few front runners, and lots of trailing candidates. I decided to take a look at historical civic elections in Edmonton and Calgary, and posted this graph on twitter:

It's a ridiculous extrapolation, but this election is ridiculous already. Winner of #ward12 should get ~21% of vote. pic.twitter.com/jHsN1QHR91
— Michael Ross (@Mikerobe007) February 22, 2016

Obviously it was a long shot extrapolation, as very few past elections have had even half the number of candidates as this one.

How well did I do? The winner, Moe Banga, got 17.76% of the vote, which adjusts our graph to this:

I'd say that looks pretty decent. Sure, it was a long shot estimate, but it was certainly a lot less dire than the worst-case guess would have been.

Vote Distribution

Sadly, because of its by-election status, there were only seven polling stations in this election, which means that mapping voting data is a bit silly. Here are the results of the winner in each poll anyway, though.

Again, not much to write home about. This map is a lot more boring than it could have been largely because all the vote totals tend to be quite close together, and each poll represents several neighborhoods. 7 data points just isn't enough to have fun with.

That being said, we can see that Moe Banga had fairly wide-ranging support, which will be encouraging for him going into council. Irfan Chaudhry, who came in fourth, had a narrow lead in Charlesworth, Walker, and Ellerslie neighborhoods, and Laura Thibert, who came in second, had a decent lead in Larkspur and Wild Rose.

Election Strategies

Finally, when we look at the total vote breakdown a little more closely, we can get a bit of an idea as to how some of the campaigns prioritized.

Take advance voting, for example. Advanced votes accounted for 27.3% of all votes cast, but were prioritized by some candidates more than others. Moe Banga, the winner, was clearly quite organized and got 37% of his vote in advance (these advance votes of his were enough to beat the total votes of 26 other candidates even). Other high-ranking candidates, like Balraj Manhas, Arundeep Sandhu, Yash Sharma, and Rakesh Patel all got over 35% of their vote in a similar way. Candidate Sam Jhajj was a clear outlier though - getting 70.3% of his votes from advance voting.

And finally, special ballots only accounted for 1.1% of all ballots, and are reserved for people who can't vote in either advanced or normal ballots. Strangely enough, three candidates (Moe Banga, Rakesh Patel, and Balrash Manhas) combined had more than half of these votes. Kudos for grabbing the out-of-towners, I suppose!

That's it! And we won't have another election in Edmonton for 18 months! See you after the next one.

Taxi Stats for Edmonton

2016-01-26T16:46:00.002-07:00

The fight in Edmonton for Uber vs Taxis has finally reached its head today, with the final vote for the updated Vehicle for Hire bylaw to be debated by the end of the day (barring any disruptions to council).

Potentially complicating matters is that this takes place during the very beginning of the Ward 12 by-election, meaning one seat on council is vacant. This is perhaps extra complicating matters as some of the candidates for the by-election are either directly involved with the United Cabbies Association of Edmonton or calling for a postponement of the debate, under the understanding that Ward 12 contains a disproportionate number of taxi drivers relative to other wards of the city.

Of course, taxi drivers are only one half of the equation, and taxi users are also important to consider. While the stat on the previous link that 35% of cabbies live in Ward 12 is only sourced to the Nav Kaur campaign, there is plenty of other information from the cab users from a 2014 city survey that reveals some data from the consumer side of things. This first map, for instance, indicates the percentage of people who regularly take taxis in each ward*:

Green: higher taxi usage; Red: lower taxi usage
*Postal codes T5C, T5S, and T6P had low survey turnout and probably should be disregarded in this map.

As may be expected, taxis are more commonly used in the interior of the city, and less commonly used to the west and east. Councillors who might be more concerned than average about their constituents' access to taxis could include McKeen (ward 6), Henderson, (ward 8), Walters (ward 10) and Nickel (ward 11).

The survey also looked at the perceived importance and satisfaction for taxis in Edmonton. Both questions were rated from 1-5, with five being the most positive (extremely important and very satisfied, respectively). The averages for each ward are:

Importance:

Green: High importance; Red: Low importance

Satisfaction:

Green: High satisfaction; Red: low satisfaction

On average, Edmonton citizens tend to view taxi services as somewhere between moderately and very important (3.86/5), and are somewhere between somewhat dissatisfied and neutral about their experiences (2.79/5).

This post is mostly not to provide opinions, but to share some of the data that the city has on taxi users in Edmonton. There are parts of the city where people regularly take taxis and think they are important, but also aren't satisfied with the service they receive, and regardless of the outcome of today's vote hopefully opening up the discussion around taxi alternatives results in a better user experience overall.

Electoral Reform

2015-10-09T07:59:00.000-06:00

In less than two weeks we're going to head to the polls and elect the 42nd Government of Canada. The elections process we use is pretty old and potentially outdated, and it's time that we started to take a proper look at how it works.

A quick overview of what we currently have: on October 19th, you and the ~100,000 people who live closest to you will have the chance to pick who represents you at the House of Commons. Whoever gets the most votes between you and your neighbors wins, and becomes a Member of Parliament.

Now, most candidates in each riding are associated with a political party. When they all get to the House of Commons, they tend to stick together with other people of their party, and most of the time vote the same way. Once everyone has gotten to the House of Commons, the Governor General (representing our Head of State, the Queen of England), offers the leader of the largest elected party the opportunity to form a government, which is subsequently voted on by all members of the House.

Unless you happen to live in a riding with the leader of a major party, you never actually directly vote for who is going to be prime minister or how powerful of a government that leader will have.

If the largest party has more than half of the seats (a majority), they'll win this vote pretty easily and form government. If they don't, they can still form government by arranging deals with other parties, but probably won't stick around for too long.

Our system of electing people to represent us has its share of failures. For one thing, as a representational democracy, you'd hope that everyone is equally represented in the House of Commons, but this isn't necessarily the case. In fact, the riding with the fewest voters (Labrador - population 26,728) has almost 5 times less population than the riding with the most (Brantford-Brant, population 132,448). This means that a voter in Labrador has 5 times the voting power of one in Brantford. The full distribution of voters per riding looks like this:

This is actually really badly distributed, and the problem is serious enough that there are often laws in place to prevent this sort of thing from happening. In Alberta, for instance, the Electoral Boundaries Commission Act limits population deviation between provincial electoral divisions to 25% of the average. Provinces like Saskatchewan and New Brunswick limit deviation from the average to only 5%, but in federal elections the largest deviation is an overwhelming 73% below average.

(Before you despair, though, keep in mind that the US Senate assigns the same number of senators per state regardless of its population. The best-represented US senate voter has 66 times the voting power of the worst-represented, and the largest population per senate seat is 510% of the average.)

So our system isn't really all that great at ensuring that everyone's vote is worth the same, but is it any good at reflecting people's voting choices?

Not entirely. At a small scale, the winner of any given riding is indeed the candidate with the most total votes, but more often than not they don't win with a majority of votes. Over the last three years, 58% of seats have been won with less than 50% support, and the worst winner became an elected Member of Parliament with only 29.1% support.

This in and of itself isn't inherently evil, but our system is designed to elect representatives for each riding, and it is hard to argue that someone with less than 50% support is always the most representative of the area. The worst case situation I mentioned earlier had a Bloc Quebecois member elected with 29% support, but if the elected MP were to vote on the issue of Quebec separation, should he listen to the 29% of people who supported him, or the 71% of people who voted for explicitly non-separatist candidates??

Most issues that MPs vote on aren't that white-or-black, of course, but it's not unreasonable for MPs to be put in positions where their party line disagrees with the majority opinion of their constituents, and this is not an effective form of representational democracy.

This leads to one of the biggest issues with First Past the Post electoral systems - they are prone to dishonest or strategic voting. In our current election, there is a push from multiple groups to coordinate voting in swing ridings, with the mindset that it's better to vote against a specific party than to vote for a party you actually care about. Any seat that is won with less than 50% support can be prone to strategic voting, especially if voters either don't feel their preferred candidate has a chance to win or if voters are particularly angry at the front-runner.

When we zoom out a bit, though, this lack of majority support per riding can lead to larger effects on a reigonal scale. If a region of five ridings has an evenly-distributed population with, say, 40% support to one party and 30% support split between each of two other parties, the party with 40% could easily end up with candidates elected in a majority of ridings and take control easily (if that party has control of setting electoral boundaries, they can potentially do this even without being the most popular by gerrymandering).

This is the issue of (non-geographic) proportional representation, and on a national scale our elections haven't always turned out terribly proportionately. In fact, the last four elections look like this:

For the last four elections, the parties that contested all available seats (hence the lack of Bloc Quebecois in the graph, they're silly anyway) have actually very neatly traced a cubic relationship between party support and seat allocation, instead of the line that we might expect.

Before I talk about proportional representation too much more, I'd like to acknowledge a key assumption about proportional representation: it is often assumed that a voter for a candidate from a party in one region supports that party in all regions. This is definitely not always the case - I personally have voted for candidates while disliking their party leadership, and have known others to vote strictly based on candidate and not on party at all. When people argue that a party with 40% voter support should get 40% of the power in the House of Commons, that is definitely not the motivation that has driven all of those votes to have been cast in the first place.

Nevertheless, there are certainly benefits to having a system where party power post-election reflects party support. Much like an individual MP being elected with less than half of the support of their riding, the above graph shows that it's likely (and has certainly happened in the past) for parties to end up with a majority government with less than a majority of voters' support.

Hopefully by now I've shown that there are issues with the current first past the post system that we have in Canada. Not everyone's votes counts at the same value, and once they've been counted we're not represented to the best extent possible.

There are three leading proposals out there that are being put forward for how to deal with some of the shortfalls of our current system that are worth discussing: Alternative Voting (AV), Mixed Member Proportional (MMP), and Single Transferable Vote (STV). There are, of course, hundreds of ways that we could change our system, but these three have a history of being considered in Canada, and are worth taking a closer look into.

Alternative Voting

AV (also called Instant-Runoff Vote) isn't officially being proposed by a political party right now, but is apparently a leading possibility in the Liberals' current plans to overhaul the electoral system.

When voting in an AV election, every voter has the choice to rank as many candidates as they want. All voters' first choices are added up, and if any candidate has over 50% of the votes they win. If nobody has enough, then the candidate with the lowest first-place votes is eliminated, and all votes for them are redistributed to the second-ranked choices. This process continues until eventually one candidate has over 50% of the vote and wins.

Source

If we were to replace our current system with Alternative Vote tomorrow, the results in ~42% of our ridings would definitely not change at all, as historically about that many seats have over 50% support to one candidate anyway. Only ridings with very close first-round results would be likely to end up differently between FPTP and AV election systems - quite often the front-runner off the first-choice votes ends up winning.

So why is Alternative Vote any better than First Past the Post? Mainly because it reduces the need of strategic voting. If your favored candidate is unlikely to win, you don't have to consider a vote for them wasted, because either the winner will win with over 50% support (and you couldn't have changed it anyway), or your vote will get redistributed to someone else of your choice. This is good inasmuch as it allows voters to vote with their conscience.

On the other hand, all the rest of the problems that exist in First Past the Post still exist in Alternative Vote. An analysis of the 2015 UK election, for instance, suggests that the results would have been even more disproportionate under AV than FPTP, and while it's true that voters can't be left with a candidate winning with 30% support, if the 50% benchmark is hit through a series of second, third, or fourth-place votes then we still have a situation where the majority of voters never picked their representative, or only picked them as a compromise solution. Keeping a system of single-winner ridings with a minimum 50% support doesn't ensure proportionality at all.

Mixed Member Proportional

MMP has been proposed by the NDP as an election promise this year, to be implemented by the next election. It has been proposed by the Law Commission of Canada and independent provincial commissions, but has also officially been rejected by the voters of Ontario in 2007.

Youtube again does a very good job of explaining it here, but the quick explanation is that when a voter goes to vote in an MMP system, they get two ballots. One is the exact same as our current ballot, and the other is strictly for party preference. After all votes are cast, whoever ends up with the most votes in each riding from the first ballot gets elected, just as normal.

After that, the proportion of party support received off the second ballot is compared to the number of seats won from the first ballot, and additional seats are filled from pre-existing party lists until the overall proportion of seats in the House of Commons matches the proportion from that second ballot. Overall, half of the seats in the House of Commons would be filled by each ballot.

Source

MMP is touted as providing a good balance between keeping geographical representation, while guaranteeing partisan proportionality and avoiding the need for strategic voting. However, it misses the mark on most of these.

The issues mentioned before regarding First Past the Post winners not representing a large proportion of their constituents absolutely have not disappeared in MMP - this would leave the same number of voters without a local representative in parliament as before. Strategic voting for or against candidates at a local level would still occur too - in fact, MMP is at best a half fix from First Past the Post, since half of the ballot takes place in the exact same way as always. (It may be worse than FPTP, since the directly-elected MPs would be responsible for ridings double in size from previous ones, in order to make room for the new list-only MPs.)

So let's focus on the second half of the ballot. By filling seats based on a candidate's ranking on a party list, instead of electing them directly, a whole second class of MP would be created. These MPs wouldn't be accountable to any citizens or have to represent them, since they would owe the existence of their jobs solely to the popularity of their party.

In my opinion, this is less of an issue in countries that currently use MMP, like New Zealand, Germany, and Scotland, than it is in Canada, mostly because combined those three countries cover an area only a little bigger than Alberta. If a party ends up being elected largely without direct constituency seats, then their MPs in the country capital may not be the best suited to debate issues taking place thousands of kilometers away as opposed to only hundreds of kilometers away. The lessened importance of geographical representation in MMP becomes more of an issue when your country sprawls across five and a half timezones.

The second ballot of MMP also isn't immune from strategic voting. For instance, in the 2005 Albanian election the two major parties convinced their supporters to vote for them on the constituency ballots, and to vote for smaller coalition partner parties on the party ballot. This resulted in an unbalanced situation where previously tiny parties shared all the party ballot seats, and the large parties shared the constituency seats, and a pre-determined coalition took control. Similarly, in the 2007 Lesotho election, the major parties split in two and used decoy parties for the list seat votes. This example of gaming the system resulted in one party taking 69% of the seats with 52% of the vote.

Obviously these are in the past and the development of a new electoral system could hopefully take steps to avoid loopholes like this, but other concerns still exist. MMP ballots tend to fall into two camps - open lists or closed lists. A closed list would trust the parties to fill in their top-up seats at their discretion, creating the possibility of career politicians who keep getting elected based on value and loyalty to the party, whereas an open list (as favoured by the NDP) would have voters choose the list of MPs-to-be. Needless to say, this option would take a "simple two ballot" system and add an enormous amount of ranking to it.

Mixed Member Proportional voting system tries to be the best of both worlds, but doesn't necessarily excel at either. Its local representation has all the issues we already have, and its push for proportionality creates a second class of MP with questionable accountability. Both MMP and AV appear to be half solutions to the current issues with FPTP, but they seem to address separate halves of the issue.

Single Transferable Vote

STV was recommended by the Citizens' Assembly of British Columbia in 2004, and got 57.7% percent support in a 2005 referendum. But the BC government said they wouldn't be bound by anything lower than 60% support, campaigned hard against it, and when it came back to referendum in 2009 it only got 39.1% support.

Yet again, Youtube has a good explanation, but what happens in STV is that ridings are merged together, with several winners getting elected from each riding. Because of this, each party can nominate multiple candidates. Voters then rank as many candidates as they want in each riding.

Based on the number of seats to be filled in each new riding, a minimum threshold of votes is needed to get elected. Then an adapted version of Alternative Vote takes place - any candidates with more than that the threshold are elected, and their surplus votes are redistributed at a fractional value to the next-ranked choices. This repeats until no candidate has reached the threshold, at which point the lowest-ranked candidate is eliminated, and voters' subsequent choices are redistributed at full value.

Source

The use of fractional vote redistribution for winners ensures that voters who pick popular candidates aren't penalized. If a majority of the population all pick candidates from Party A, then effectively the most popular candidate from the party will share their surplus votes to voters' subsequent choices, so that voters don't have to worry too badly about the order in which they rank multiple more-or-less equal choices. Similarly, candidates who are eliminated have their votes redistributed at full value so that their voters aren't penalized either.

So how does this fix anything? Well first of all, the issue in FPTP where people could feel unrepresented is very nearly eliminated. Imagine a riding that elects five MPs - in this case, each MP would need 16.67% of the vote (plus one vote) to get elected. This is the minimum threshold that five MPs could each hold, but that six people couldn't (because of the +1 vote requirement). This means that, at the absolute worst case scenario, 83% of everyone's ballots would go directly or indirectly towards electing someone, and almost always the people elected would be the first or second choices of the voters.

Strategic voting would no longer be an issue. Unlike FPTP, if a voter's preferred candidate doesn't get elected, that isn't going to jeopardize the chances of a compromise candidate. Voters could vote with a clear conscience, and smaller parties would not have to worry about losing votes to strategic coordination.

Also, the more seats there are in each riding, the closer to proportional the overall result would become. If 40% of people vote Party A in a 5-person riding, then only the two most popular Party A candidates will get elected, and all the votes cast for Party A candidates won't be redistributed anymore because they've succeeded in getting seats filled. Smaller parties will have a better chance of getting elected by the popularity of individual candidates, as the threshold to get elected to a multi-seat riding will be lower.

While the national results will be very close to proportional, proportionality isn't the end-goal of STV like it is in MMP. That means that voters who vote based on party affiliation can lump their ranking by party, but voters who vote based on candidates would be free to do so as they wish.

STV keeps geographical representation and proportional representation. So why isn't it being proposed by any major parties?

Well for starters, those reports I mentioned before supporting MMP were a bit weakly worded. The PEI report liked both systems, but felt that MMP would be easier to swallow for Canadians since it's only changing half of the system (pg 98). The Law Commission report considered STV for about half a page (pg 103) before rejecting it based on it not having "geographical representation ... effective/accountable government ... [or] regional balance" without explanation. To their credit, neither the Liberals nor the Green Party have dismissed STV, and instead officially are proposing to form committees to investigate the best way forward.

Most complaints about the STV system are that it is complicated to explain and results in long ballots. These are definitely true, though as mentioned before an open list MMP system would likely have a similarly long ballot, and the idea of ranking people is the same as in AV. STV is also certainly not the most complicated voting system out there, but often the complications are required to avoid strategic voting openings.

Canada's electoral system is old and showing some signs of wear and tear. Both the disproportionate and inadequate representation that we have in our current system can and should be fixed, and the sooner that this happens, the better. But we don't have to accept only the Alternative Vote being entertained by the Liberals or the Mixed Member Proportional system offered by the NDP - they are at best half a fix, and at worst no better than our current system. If we're going to fix this, we should do this right.

Fortune Favours the Old

2015-08-07T10:57:00.001-06:00

I had my birthday recently and, much like last year, got a joke present from a friend. This year though, it came with an explicit challenge to do something statistical with it. So for this blog post, my subject matter will be based on this box of fortune cookies:

My willing victim

So what stats can be pulled out of a box of fortune cookies? First of all, I suppose the box says there are approximately 25 cookies, but in reality it came with 38 fortunes. Ridiculous quality control, let me tell you!

Of course, most fortunes just floated around without cookies

Fortunately, each fortune has a set of numbers on the back. Numbers are good, so let's do stats with those and leave the yummy cookie bits for later.

Fortunes not necessarily to scale.

Each fortune has a series of 6 ascending, non-repeating integers on the back. Presumably these are lucky numbers for your next lottery, but given just this set of numbers we can't necessarily tell which lottery they might be meant for. But can we make an educated guess?

Quick history lesson: in World War II, the Allies were at least somewhat concerned with estimating how many tanks Germany was building in any given month. One way they had was conventional espionage, which suggested that the Germans were building approximately 1,400 tanks every month between June 1940 and September 1942 (a lot of tanks). Of course, spies sometimes lie (it's their job, after all), so the second way the Allies had to estimate tank production was using statistics on captured tanks.

Every tank had a whole bunch of parts, and every part had a serial number stamped into it during production. These serial numbers were unique for every tank, and in the case of the gearboxes in particular, fell in unbroken sequences. Based on the distribution of serial numbers, a relatively simple formula could give an estimate of the total number of tanks produced. For instance, if the Allies saw that the tanks they destroyed in a given month were tanks produced #25, #94, #141, and #198 of that month (and were confident they were destroying them randomly), they'd be much less worried than if they destroyed tanks #52, #306, #519, and #1058.

It actually turned out way more accurate than anyone hoped - statistical estimates for tank production between June 1940 and September 1942 were 246 tanks per month, and in reality the Germans produced 245. Yay stats!

So like the famous German tank problem, looking at a fortune cookie's string of numbers can give us an estimate of the total number of 'lucky numbers' that the fortune cookies might offer. In the above example, there are 6 numbers decently evenly spaced between 2 and 47. A frequentist statistical approach, therefore, suggests that the total number of possible numbers that could be on the backs of these fortune cookies is 53.83, with a 95% confidence interval of 47-77. Not terribly precise when looking at a single fortune. Another fortune might have a series of numbers with a likely maximum of 48, for instance, and if we look at the average of all 38 fortunes in the box, the average 'expected number' ends up being 49.4. And in fact, of all 38 fortunes, all numbers on the backs were between 1 and 49.

So we have six numbers, chosen between 1-49. Sounds like we're playing Lotto 6/49!

Here's what the distribution of all lucky numbers ended up being:

It kinda looks like number 37 comes up way more often than the rest, and numbers 9 and 13 are super under-represented. Is this a conspiracy, or random chance?

With 49 numbers to choose from, 6 different numbers on each fortune, and 38 fortunes to choose from, we'd expect an average of 4.65 of each number to show up. With an expected 4.65 of each number, we can create a Poisson distribution to see how often we'd expect any given number to turn up, and see if ours is indeed random. That'd give us something like this:

This suggests that the distribution of lucky numbers isn't actually all that lucky, and may be pretty much what you'd expect (R2 value of 0.82, which ain't shabby). It matches particularly closely at the tails, so having a few numbers occur 10 times each isn't all that surprising really.

One last analysis for the fortune cookies. Fortunes tended to come in one of three categories: advice ("Counting time is not as important as making time count"), analysis ("You are deeply attached to your family and home"), and most popularly predictions ("You will soon find something lost long ago"). Is there any relation between the type of fortune on the front, and the sum of the numbers on the back?

Nope, nothing statistically significant anyway. The Analysis fortunes seem to generally have higher numbers on the back, but there are too few of them and they are too varied to be conclusive.

So there you go! Fortune cookies tend to have Lotto 6/49 numbers on the back that are fairly well randomly distributed. Not sure if that left any of you particularly surprised, but it's fun to know nonetheless!