Extreme Enginerding

Sunday, July 23, 2023

Measuring Inequality

It seems like with increasing frequency we hear about rising inequality, both with wealth and income distribution across our country and the world as a whole. We see articles regularly like this and this, accurately describing it as a major issue for our times.

With so much interest in the topic, it’s probably unsurprising that it’s a well-studied field. Before you can properly wrap your head around something you have to measure it, and in order to get policy makers to pay attention you pretty much have to boil that measurement down to a single number. So it isn’t shocking at all that economic inequality can be measured by a single value, known as the Gini Coefficient.

I Dream of Gini

To start looking at measuring inequality we can survey a population, rank people based on their wealth, and compare the percentage of people poorer than a given person to the percentage of wealth held by people poorer than that person. Effectively, these two values will be identical in a purely even distribution but further apart as the inequality starts to grow. If everyone has the same wealth, then the poorest 20% of the population will have 20% of the money, and the poorest 70% will have 70% of the money. If we plot this, we’d get a straight line of equality:

A more unequal distribution might look like this, though, where the poorest 20% only has 5% of the wealth, and the poorest 70% only has 50%:

The Gini coefficient compares these two curves, the equality curve and the actual curve of the population, by comparing the area under the actual curve (B) to the area between the curve and the line of equality (A). The bigger the area between the curves (Area A), the bigger the Gini coefficient, so a Gini of 0 means a perfectly equal society, and a Gini of 1 means effectively that all the wealth is concentrated in the hands of one person.

Gini Coefficient Calculator

Gini in a Bottle

The Gini coefficient isn’t a perfect way of measuring inequality, but does a pretty good job. In the absence of social programs like a universal basic income, it’s worth pointing out that there will probably always be a non-zero Gini income coefficient, and that that’s not inherently evil. For instance, people late in their careers tend to make more money than newborn infants, and we’re generally ok with that.

The Gini coefficient also could give the same number to different distributions, if the shape of the curve is different but still results in the same relative areas. This means that overall it’s better as a relative indicator of inequality than a pure comment on the status of a society.

Unleash the Gini

As a very basic example for figuring out a Gini coefficient of our very own, we can take a look at a 10 player “Sit n Go” poker tournament. Following a common model used in online tournaments, 10 players sign up and the winner gets 50% of the pot, second place gets 30%, and third place gets 20%. Everyone else gets nothing, though hopefully has lots of fun too.

If we wanted to plot the curve we talked about before (incidentally, called a "Lorenz Curve"), we could use the information that the bottom 70% (the 7 losers) get 0% of the wealth, the bottom 80% (7 losers + third place) own 20% of the wealth, and the bottom 90% own 50%. Put that all together and we get this graph:

Area A, between the curves, can now be compared to the total area of A+B, and we get a Gini coefficient of 0.76.

Before we get to the actual point of all this, it’s worth taking a second to reflect here. Splitting a population into ten groups and having 50% of the wealth go to the group that's best at poker is no basis for a system of distributing wealth. That's 50% of all cash, stocks, bonds, houses, privately held land, and super yachts. Extending the analogy, even if we were to pretend that the "best" 10% is who ends up with half the wealth, where here poker ability might correspond to concepts like hard work, diligence, education, etc, that still feels raucously unfair to end up with a distribution as shown above. And that's ignoring the fact that a significant proportion of wealth is significantly correlated to the wealth of one's parents, negating a lot of the 'hard work' argument.

So here's the issue. The Gini coefficient for the distribution of wealth in a 10 player online poker tournament is 0.76. The Gini coefficient for the distribution of wealth in Canada is 0.73.

Now on the one hand, admittedly there's a little bit of room between 0.73 and 0.76. 0.76 is about the same relative inequality as in Vietnam, a bit worse than a country like Egypt (0.756) and a bit better than a country like Bolivia (0.764).

On the other hand, Canada is about the same as countries like Uganda and Liberia, which may come as a surprise to some overly self-righteous Canadians. As well, the most recent statistics are from 2019, and studies show that inequality has only risen during the Covid-19 pandemic.We very well could be worse off than if our society had been set up as though by poker tournament.

Another thing to mention is that, as I said before, the Gini coefficient doesn't really comment on the shape of the Lorenz curve, just the area. And obviously Canada doesn't have 70% of people with absolutely no wealth, so maybe while the numbers are similar they don't mean the same thing, maybe it isn't quite as dire as it sounds?

Unfortunately it's not that simple. A 2012 report from the Broadbent Institute showed this graph for Canada:

wealth concentration canada

Shockingly, the top 10% (at that time) ended up with almost half the wealth, not far off from the poker example. The bottom 10% didn't just have “no wealth”, they owed more than they owned. I'd argue that if you want to think about just how rich the rich are in Canada, a 10 person poker tournament is, distressingly, actually a very good analogy.

As an aside, at least in terms of poker analogies, things can always get worse. The $10,000 Main Event at the World Series of Poker handily posts its payout table online, and if you do a similar analysis you get something much much worse:

This Gini coefficient comes in at a whopping 0.94, thankfully much higher than any real country. This is what a Lorenz curve looks like when 0.5% of the population has 50% of the wealth, and is genuinely terrifying to contemplate as a future if we don't sort things out in the real world.

Canada has a long way to go in terms of wealth inequality, but obviously it gets worse too. The United States (0.852) and Russia (0.879) have absurdly high wealth inequalities. But worst of all? The world as a whole, sitting at a Gini coefficient for wealth distribution of 0.885. We have the means to measure this and the tools to do address it, and it's well past time we do something about it.

Wednesday, June 30, 2021

Voting Patterns for Edmonton City Council's 2017-2021 Term

City Council, unlike other levels of government, doesn't rely on party systems for categorizing its members. That being said, there still can be, and in fact are, patterns in how members of council vote, and with the Open Data that's available on council voting records, these patterns can be examined.

There are a lot of different ways to visualize voting patterns, and I've played around with these before (see here and here - unfortunately, since most of the visuals for this blog relied on the now-dead Google Fusion Tables, there's really not much to see). I've settled on three favourite methods for the 2017-2021 Edmonton city council term - let's take a look!

First of all, as in previous years, I've disregarded all motions that were unanimous as they provide no particular differentiating information. That leaves the 2017-2021 term with 921 non-unanimous votes to examine (at time of writing).

The first pattern-finding method I like to use it to simply look at the success rates of each member of council. How often did a vote go the way they wanted it to? This can be a sign of consensus-building, or an indicator of work put in behind the scenes (perhaps at other committees), or potentially a matter of being a part of a majority bloc that tends to vote similarly:

While a direct comparison is perhaps unwise, these number in general follow the same pattern as my similar 2016 analysis. Of members of council who were present both years, Councillor Esslinger and Mayor Iveson are again the top two and Councillor Nickel is again the lowest. Councillors Walters, Knack, and Henderson are all within 5% of their 2016 results as well, with Councillor Caterina showing a slightly larger difference from before.

This of course is not intended to imply anything about the effectiveness of individual members of council, and is performed without a review of the motions themselves (whether they are procedural, multiple readings of the same bylaw, etc.).

Noteworthy from the last analysis was the result that Mayor Iveson had only 'lost' 17 votes out of 358 non-unanimous motions in the previous term. For comparison, at the time of writing, this number is now 94 votes.

A second pattern-finding visualization is how often members of council agree with each other. For the 2017-2021 term so far, that is:

The result from this analysis shows that a group of six members of council agree with each other more than 80% of the time across all pairings, and that a seventh member (Councillor Henderson) is just outside with a 79% minimum agreement rate (with Councillor Hamilton). With a council size of 13, seven members is a winning majority on most motions. Certainly, there is a correlation between the top six council vote winners and this group of six members of council - whether this group is ideologically similar or just more likely to compromise and build consensus is beyond the scope of this analysis though!

A third and final pattern-finding visualization that I quite like is adapted from the NOMINATE system used to scale members of the United States Congress. It is intended to represent ideological similarities and differences between members of council in a spatial manner - members closer to each other agree more, and further apart agree less frequently:

I'd like to stress at this moment that, as it's often tough to assign traditional political ideologies to city council bylaw amendments, this graph does not necessarily represent traditional 'left vs right wing' traits, nor traditional 'authoritarian vs libertarian' traits. The results of the graph are intended to model councillors as though their decisions are made based solely on two non-correlated factors, and the model above is oriented with the most significant factor aligned along the x-axis.

It's totally cool if you want to stop now, but I actually really love this model system and I want to talk about it a bit more since it gained some interest when I did this for London. Effectively, the NOMINATE system models both councillors and motions along the two axes, then assigns a probability of each councillor voting one way or another based on the relative proximity to each "side" of a debate. The algorithm then iterates thousands of times, tweaking the positions of each councillor and motion in such a way to optimize the probabilities of each decision.

The net result of this is that, using only two dimensions, this use of the NOMINATE algorithm as it stands currently accurately assigns the correct vote to each councillor 93.4% of the time. While to some of you this may not seem perfect, a model that reduces the complexity of council decisions to two factors with over 90% accuracy is something I'm quite astounded by and happy with.

For instance, last week's vote to end the mask mandate effective July 1st broke down like this based on the model. Here, the orange coloring indicates 'voted no', and blue indicates 'voted yes', with clear circles for the locations of the decision-points:

Here, the percentages are the model's prediction at the odds of each councillor voting the way they did. The "yes" and "no" points are shown, and the dashed line indicates the mid-way point between the two positions. In this case, the model managed to accurately capture each member's vote (where accuracy here is defined by a yes vote with more than 50% probability, or a no vote with less than 50% probability). The probability doesn't necessarily reflect the difficulty a given member of council had in making their decision, and is more of the measure of accuracy of the model.

By looking at all votes together, the model slowly hones in on the best placement for each member of council. Not all votes are as clean cut as this one - for instance, the vote on the solar power plant at EL Smith looked like this:

You can see here that the model was very close with councillor McKeen, and effectively swapped Caterina and Dziadyk. Again, as the model is probabilistic this doesn't mean it got these 'wrong', more that having these councillors and decision points in these locations is optimized over the entire term.

It's not a perfect model, but again I'm quite pleased with how accurately it is able to capture the voting term in only two dimensions!

So that's it - three different ways to look at the data, showing different aspects of what can be learned from it!

Monday, June 28, 2021

Which Edmonton City Councillor are you?

I've done this before, and had so much fun with it that I'm happy to once again present:

A Buzzfeed-style quiz to get you more in touch with your elected representatives!

(it's totally ok if that doesn't excite you as much as it excites me)

Without further ado, here is a quiz for you to play around with. All decision points in the quiz are pulled from real votes in the 2017-2021 city council term, with information and sources provided.

Hopefully that was fun!

Like I said, I've done this before for Edmonton and London, and London was far more excited about it. The work that goes into these is an interesting mix of politics, whimsy, and data work.

The first step is to analyze the City of Edmonton open data set for Votes and Proceedings. For no discernable reason, the data set this term is inconsistent and halfway changes how votes are recorded, as well as changing how councillors are named. It's not particularly tricky to deal with, but it did have to be massaged a bit to be in a consistently usable format.

For this quiz, there's not much point in looking at unanimous procedural votes, so I focused on the 921 (at time of writing) non-unanimous votes. In an ideal world, a set of yes-no choices should require four or fewer questions in order to neatly sort into 13 possible answers (assuming approximately even splitting at each decision point). However, it's much more interesting and easy to answer the quiz when the questions are relevant and engaging.

Most of the examples I chose for this quiz have news stories attached, which in my mind was a sign of that I'd found adequately interesting votes to base this on, and as a result a user on the quiz can get to a councillor with anywhere from three to five questions, which I was satisfied with.

Hopefully you are too, because at one point in the design of this quiz one of the leading optimal votes was "That City Council waive the rules on providing notice of motion as set out in section 32 of Bylaw 18155 - Council Procedures Bylaw to allow Councillor S. Hamilton to make a motion without notice regarding the aerial mosquito program." It would've made things work so well but, well, it's hard to really care about it.

Each of the final results in the quiz genuinely leads to member of City Council who voted in the same unique way as the answers you provided. One assumption was made, which was that while Mike Nickel did not vote on his own censure, it was assumed that he would have voted no if he was forced to.

Hope you had fun!

Saturday, April 27, 2019

Which London city councilor are you?

Open data can be used for a lot of things, and public meeting minutes of elected representatives are crucial in holding representatives accountable, ensuring they represent their constituents, and promoting honesty and efficiency in our government.

Or they can be used to make Buzzfeed style personality quizzes. That's what I did.

We've now hit a point in the City Council meeting minutes from this council so far where all councillors have disagreed with eachother on interesting votes at least once, which allows us to strongly differentiate between them. By presenting some of these votes, we can narrow down a few key motions that separate all the councillors, and present it in a Classification Chart. Since that's not as fun as a quiz, though, here it is in quiz format.

Share widely, and tell me who you got! (It may take a second to load)

Monday, April 22, 2019

Alberta 2019 Election Post-mortem

Well that was fun!

How did I do?

For more than a year now I've been tracking Alberta election polls with the hope of developing a reasonably accurate prediction model. Overall, I'm happy to report that the party I predicted in the lead won in 80 out of 87 races, and my riding qualifiers broke out as follow:

"Solid" lead: 65/65 (100%)
"Likely" lead: 12/15 (80%)
"Lean" lead: 2/5 (40%)
"Toss-up" edge: 1/2 (50%)

I think this is a decent proof of concept, small "lean" sample size notwithstanding, and I want to talk a bit about what went right and what went wrong, and how I can improve if I want to keep doing this sort of thing.

First of all, the polls leading up to election day didn't turn out to be too accurate. Take a look at the province and regional splits:

Edmonton was remarkably accurate, Calgary was close, but the rest of the province and the top line results were off significantly. This is possibly a cause for concern, as it could suggest that my model was taking inaccurate data as inputs but then claiming credit for an accurate output, which it wasn't designed to do.

The NDP ended up under-performing relative to their polling numbers, and likely the only reason this didn't mess up too many election prediction models is because they under performed mostly in areas like rural Alberta, where they were predicted to lose anyway. If the polls had been that wrong about the NDP in Edmonton, say, the predictions could have been far worse.

Similarly, my model and others like me likely wouldn't have fared too well if the NDP had overperformed their polling rather than underperformed. The same amount of polling error as actually occurred, applied the other direction, could have had the NDP win the popular vote across the province.

My takeaway from this is that I need to adjust my topline polling tracker. Right now it runs under the implicit assumption that errors in individual polls will cancel each other out. This seemed reasonable given that polls are produced by different companies with different methods. That led to my full Alberta tracker having a low confidence interval for the NDP in particular, though, as several polls in a row provided the same result. If I instead make the assumption that at least part of the polling error is correlated between polls, perhaps due to something beyond their control, then the final result from election night would have still been a surprise, but far less of one. Certainly something I'll take into account next time.

Other Metrics

Overall on a riding-by-riding level, I had an error of 6.4% vote share. That's not superb, but also not far from what my testing beforehand suggested, and was factored into my uncertainty. Comparing my final projection to actual results on election night doesn't look too bad:

If we ignore the Alberta Party and the Liberals, this leads to an overall R-squared value of 0.79, which I consider respectable. It's handy to ignore the low parties because they don't have much of a spread, and will skew the coefficient of determination calculation.

Very fortunately for me, if I input the final actual regional results as though they were a poll result, my model does improve. This is a good hint that my model is behaving decently, especially so since this hasn't been the case with all other forecasters.

With the correct Calgary, Edmonton, and Rural results input as large polls, my model improved to 83/87 seats correctly predicted and an R-squared for party support per seat of 0.91. Very encouraging - too bad the polls weren't more correct!

Finally, I also provided an expected odds of winning each seat for each party. It's one thing to count a prediction as a success if you give it 100% odds of winning and it comes true, but how does one properly score oneself in the case of Calgary-Mountain View, where I gave the Liberals (10.8%), UCP (16.2%) and NDP (73%) different odds of winning, and only one (NDP) did?

In this case I've scored each riding using a Brier score. A score of 0 means a perfect prediction (100% to the winner and 0% predicted for all losers), a score of 1.0 means a perfectly wrong prediction (100% to one of the losers), and because of the math, a score of 0.19 for a complete four-way coin toss (I only predicted the four parties represented in the debate).

Overall, I scored a 0.027, which is considerably better than just guessing. It's hard to get an intuitive sense of what that score really means, but it's mathematically the same as assigning an 83.5% chance of something happening and having it come true. Not a bad prediction, but there's room to be sharpened.

How did I stack up?

So like I said, there were a lot of us predicting the election this time around. I've tried to find as many as I can, and I apologize profoundly if I've missed anyone. I've only included forecasts that had either a vote breakdown per seat or anticipated odds of winning each seat for comparison purposes.

I've reported on three main measures (seat accuracy, R-squared per seat, and prediction Brier score), and I'll present as many of those for each forecaster as I was able to determine. Different forecasters win at different categories, so it's not necessarily a clear picture as to which one of us is the "best", so I'll mostly leave room here for interpretation:

I'm not claiming to be the second best, but it's important to note that being best in one measure doesn't necessarily mean best overall. There are also harder-to-evaluate measures in play here - for instance VisualizedPolitics and TooClosetoCall allow you to input poll values to see reactions for yourself, and both improved when given more accurate data (VisualizedPolitics also got to 83 seats accurately predicted, though still with a low R-squared value).

338Canada probably rightly can claim to have been the strongest this time around, but I given the polling errors we were faced with I think it'll take several more elections to determine if anyone is really getting a significant edge consistently. This isn't the first time we've compared ourselves to each other, and I think it's an important exercise in evaluating our own models and whether there's a need for more.

Thursday, October 25, 2018

London Instant Runoff Breakdown

London (Ontario) just had its first election using instant-runoff balloting. As I've mentioned before, I'm very interested in different forms of electoral reform, so as a new resident of London I was intrigued as to how the vote would work out.

London's system is a bit unusual inasmuch as voters can only rank their first three choices, but otherwise follows a pretty classic Instant Runoff system. Many of the elections resulted in first round winners, and therefore don't have a lot of room for fun analysis, but some of them went deeper and I thought it might be fun to show how the progressed in a Sankey diagram!

First of all, here's Ward 5 (my ward!):

As with all of the following, the leader in the first round ultimately ended up winning. Due to the lack of ability of voters to rank more than three candidates, the number of exhausted votes tends to grow quite quickly after the third round. Interesting patterns include the large number of Clarke supporters moving to Cassidy, and the relatively large number of Knott supporters preferring Warden over Cassidy at the end.

Ward 8

This race ended closer than it began, and likely didn't see any change in leader throughout the race due to the lack of strong trends in down-ballot rankings.

Ward 9

This race ended quite quickly, with Hopkins getting more than 50% of the vote by the third round after preferential support from Charlebois' supporters.

Ward 12

Similar to Ward 9 - disproportionate support from Mohamed's voters to Peloza secured a win in the fourth round.

Ward 13

One of the tighter races of the election. Kayabaga drew large support from Warren and Hughes supporters, whereas Fyfe-Millar drew more support from Wilbee and Lundquist voters.

Ward 14

Pretty straightforward - along with being the top first choice, Hillier was the preferred alternate for both Tipping and Swalwell's voters leading to a more secure finish than start.

Mayor

(Click to zoom and enhance!)

This one was far more lopsided than all the others. In the early rounds of voting, there was a small amount of jostling for positions 7-9 in the rankings, but apart from that no real changes occurred until Cheng's elimination. No abnormally strong trends in down-ticket voting occurred, though, so Holder held one throughout the end.

The city clerk has promised more detailed information to come out soon, so stay tuned for further analysis!

Monday, September 17, 2018

London City Council

Wow it's been a while since my last post. My apologies!

A principal reason for this is that I've moved - I'm no longer an Edmontonian, and am now a Londoner! London Ontario, that is. This almost definitely means I won't stop posts about Edmonton, but does mean that I'll be increasing my Ontario content.

London is currently in the midst of a civic election, so like any good new citizen to a city my first thought was to learn as much about the current council as I can so that I can make as informed a decision as possible. London's open data is pretty good, but their votes and proceedings aren't as organized quite as well as Edmonton's are.

Nonetheless, with the votes and proceedings that are available, I thought to take a look at council relationships in London in a similar way to how I did in Edmonton two years ago.

Unanimous votes aren't interesting, so I've focused this analysis on the 638 non-unanimous roll call votes as recorded in meeting minutes. First of all, let's take a look at how often each councillor agrees with each other:

Matt Brown is the mayor, and currently enjoys at least 70% agreement with 11 out of 15 councillors, which isn't too shabby. In general, there appears to be a mild bloc of six people (Brown through Park) who all agree quite strongly with each other, another similar block (Park through Hubert) who do the same, and then a handful of councillors who seem to go their own way.

Another sign of consensus-building on city council is the frequency that each member of council has the outcomes of votes in line with how they voted. Again, looking only at non-unanimous votes:

The mayor has been on the losing side of 51 votes out of 610 in which he's been present or not recused, which suggests a reasonable level of consensus building (though not quite as high as Iveson in Edmonton).

If we plot a graph of councillors, and connect them only if they agree at least 67% of the time, we get the following:

The cut-off here was chosen in order to include councillor Turner while still highlighting differences in agreement rates. Unsurprisingly, councillors Turner, Helmer, and Squire are relative outsiders, with a strong cluster of the six councillors mentioned before in the center. Also, this type of graph is incredibly satisfying to play with - enjoy at your own risk!

While showing relative outsiders, this plot doesn't really demonstrate any significant voting blocs. Another way to present the same data is to only connect members of council to whoever they agree with the most often. Doing that results in the following:

Here we get a more interesting structure. Nearly as many people agree more often with councillor Zaifman than Mayor Brown, though there are no separated islands of voting blocs. Only two members of council agreed with each other the most mutually, Matt Brown and Maureen Cassidy, an observation that is provided without further commentary.

The last way I'll look at voting patterns is to scale them using a variant of NOMINATE. This method was developed for analyzing US Congress voting patters, and can assign voting members to a political spectrum without needing to know what the bills being voted on were. For more information, this link is a fascinating read.

Obviously a city council is going to be less partisan than a parliamentary system, but the relative placement of councillors on the graph correlates with how often the agree or disagree with each other, as well as an approximate alignment on issues. I'll detail how this was developed in a subsequent post, but the short version is that each vote is also given a numerical position, and councillors who are closer to the "yes" vote than the "no" vote are assigned probabilities to vote either way. This is then trained against the actual vote data, and thousands of iterations of machine learning later we get this distribution.

Hopefully this has been an interesting glimpse into London city council. Have a fun election!