Saturday, April 27, 2019

Which London city councilor are you?

Open data can be used for a lot of things, and public meeting minutes of elected representatives are crucial in holding representatives accountable, ensuring they represent their constituents, and promoting honesty and efficiency in our government.

Or they can be used to make Buzzfeed style personality quizzes. That's what I did.

We've now hit a point in the City Council meeting minutes from this council so far where all councillors have disagreed with eachother on interesting votes at least once, which allows us to strongly differentiate between them. By presenting some of these votes, we can narrow down a few key motions that separate all the councillors, and present it in a Classification Chart. Since that's not as fun as a quiz, though, here it is in quiz format.

Share widely, and tell me who you got! (It may take a second to load)


Monday, April 22, 2019

Alberta 2019 Election Post-mortem

Well that was fun!

How did I do?

For more than a year now I've been tracking Alberta election polls with the hope of developing a reasonably accurate prediction model. Overall, I'm happy to report that the party I predicted in the lead won in 80 out of 87 races, and my riding qualifiers broke out as follow:

  • "Solid" lead: 65/65 (100%)
  • "Likely" lead: 12/15 (80%)
  • "Lean" lead: 2/5 (40%)
  • "Toss-up" edge: 1/2 (50%)
I think this is a decent proof of concept, small "lean" sample size notwithstanding, and I want to talk a bit about what went right and what went wrong, and how I can improve if I want to keep doing this sort of thing.

First of all, the polls leading up to election day didn't turn out to be too accurate. Take a look at the province and regional splits:







Edmonton was remarkably accurate, Calgary was close, but the rest of the province and the top line results were off significantly. This is possibly a cause for concern, as it could suggest that my model was taking inaccurate data as inputs but then claiming credit for an accurate output, which it wasn't designed to do.

The NDP ended up under-performing relative to their polling numbers, and likely the only reason this didn't mess up too many election prediction models is because they under performed mostly in areas like rural Alberta, where they were predicted to lose anyway. If the polls had been that wrong about the NDP in Edmonton, say, the predictions could have been far worse.

Similarly, my model and others like me likely wouldn't have fared too well if the NDP had overperformed their polling rather than underperformed. The same amount of polling error as actually occurred, applied the other direction, could have had the NDP win the popular vote across the province.

My takeaway from this is that I need to adjust my topline polling tracker. Right now it runs under the implicit assumption that errors in individual polls will cancel each other out. This seemed reasonable given that polls are produced by different companies with different methods. That led to my full Alberta tracker having a low confidence interval for the NDP in particular, though, as several polls in a row provided the same result. If I instead make the assumption that at least part of the polling error is correlated between polls, perhaps due to something beyond their control, then the final result from election night would have still been a surprise, but far less of one. Certainly something I'll take into account next time.

Other Metrics


Overall on a riding-by-riding level, I had an error of 6.4% vote share. That's not superb, but also not far from what my testing beforehand suggested, and was factored into my uncertainty. Comparing my final projection to actual results on election night doesn't look too bad:


If we ignore the Alberta Party and the Liberals, this leads to an overall R-squared value of 0.79, which I consider respectable. It's handy to ignore the low parties because they don't have much of a spread, and will skew the coefficient of determination calculation.

Very fortunately for me, if I input the final actual regional results as though they were a poll result, my model does improve. This is a good hint that my model is behaving decently, especially so since this hasn't been the case with all other forecasters.


With the correct Calgary, Edmonton, and Rural results input as large polls, my model improved to 83/87 seats correctly predicted and an R-squared for party support per seat of 0.91. Very encouraging - too bad the polls weren't more correct!

Finally, I also provided an expected odds of winning each seat for each party. It's one thing to count a prediction as a success if you give it 100% odds of winning and it comes true, but how does one properly score oneself in the case of Calgary-Mountain View, where I gave the Liberals (10.8%), UCP (16.2%) and NDP (73%) different odds of winning, and only one (NDP) did?

In this case I've scored each riding using a Brier score. A score of 0 means a perfect prediction (100% to the winner and 0% predicted for all losers), a score of 1.0 means a perfectly wrong prediction (100% to one of the losers), and because of the math, a score of 0.19 for a complete four-way coin toss (I only predicted the four parties represented in the debate).

Overall, I scored a 0.027, which is considerably better than just guessing. It's hard to get an intuitive sense of what that score really means, but it's mathematically the same as assigning an 83.5% chance of something happening and having it come true. Not a bad prediction, but there's room to be sharpened.

How did I stack up?

So like I said, there were a lot of us predicting the election this time around. I've tried to find as many as I can, and I apologize profoundly if I've missed anyone. I've only included forecasts that had either a vote breakdown per seat or anticipated odds of winning each seat for comparison purposes.

I've reported on three main measures (seat accuracy, R-squared per seat, and prediction Brier score), and I'll present as many of those for each forecaster as I was able to determine. Different forecasters win at different categories, so it's not necessarily a clear picture as to which one of us is the "best", so I'll mostly leave room here for interpretation:



I'm not claiming to be the second best, but it's important to note that being best in one measure doesn't necessarily mean best overall. There are also harder-to-evaluate measures in play here - for instance VisualizedPolitics and TooClosetoCall allow you to input poll values to see reactions for yourself, and both improved when given more accurate data (VisualizedPolitics also got to 83 seats accurately predicted, though still with a low R-squared value).

338Canada probably rightly can claim to have been the strongest this time around, but I given the polling errors we were faced with I think it'll take several more elections to determine if anyone is really getting a significant edge consistently. This isn't the first time we've compared ourselves to each other, and I think it's an important exercise in evaluating our own models and whether there's a need for more.

Thursday, October 25, 2018

London Instant Runoff Breakdown

London (Ontario) just had its first election using instant-runoff balloting. As I've mentioned before, I'm very interested in different forms of electoral reform, so as a new resident of London I was intrigued as to how the vote would work out.

London's system is a bit unusual inasmuch as voters can only rank their first three choices, but otherwise follows a pretty classic Instant Runoff system. Many of the elections resulted in first round winners, and therefore don't have a lot of room for fun analysis, but some of them went deeper and I thought it might be fun to show how the progressed in a Sankey diagram!

First of all, here's Ward 5 (my ward!):


As with all of the following, the leader in the first round ultimately ended up winning. Due to the lack of ability of voters to rank more than three candidates, the number of exhausted votes tends to grow quite quickly after the third round. Interesting patterns include the large number of Clarke supporters moving to Cassidy, and the relatively large number of Knott supporters preferring Warden over Cassidy at the end.

Ward 8
This race ended closer than it began, and likely didn't see any change in leader throughout the race due to the lack of strong trends in down-ballot rankings. 


Ward 9
This race ended quite quickly, with Hopkins getting more than 50% of the vote by the third round after preferential support from Charlebois' supporters.

Ward 12


Similar to Ward 9 - disproportionate support from Mohamed's voters to Peloza secured a win in the fourth round.
Ward 13
One of the tighter races of the election. Kayabaga drew large support from Warren and Hughes supporters, whereas Fyfe-Millar drew more support from Wilbee and Lundquist voters.

Ward 14


Pretty straightforward - along with being the top first choice, Hillier was the preferred alternate for both Tipping and Swalwell's voters leading to a more secure finish than start.


Mayor

(Click to zoom and enhance!)

This one was far more lopsided than all the others. In the early rounds of voting, there was a small amount of jostling for positions 7-9 in the rankings, but apart from that no real changes occurred until Cheng's elimination. No abnormally strong trends in down-ticket voting occurred, though, so Holder held one throughout the end.

The city clerk has promised more detailed information to come out soon, so stay tuned for further analysis!

Monday, September 17, 2018

London City Council

Wow it's been a while since my last post. My apologies!

A principal reason for this is that I've moved - I'm no longer an Edmontonian, and am now a Londoner! London Ontario, that is. This almost definitely means I won't stop posts about Edmonton, but does mean that I'll be increasing my Ontario content.

London is currently in the midst of a civic election, so like any good new citizen to a city my first thought was to learn as much about the current council as I can so that I can make as informed a decision as possible. London's open data is pretty good, but their votes and proceedings aren't as organized quite as well as Edmonton's are.

Nonetheless, with the votes and proceedings that are available, I thought to take a look at council relationships in London in a similar way to how I did in Edmonton two years ago.

Unanimous votes aren't interesting, so I've focused this analysis on the 638 non-unanimous roll call votes as recorded in meeting minutes. First of all, let's take a look at how often each councillor agrees with each other:



Matt Brown is the mayor, and currently enjoys at least 70% agreement with 11 out of 15 councillors, which isn't too shabby. In general, there appears to be a mild bloc of six people (Brown through Park) who all agree quite strongly with each other, another similar block (Park through Hubert) who do the same, and then a handful of councillors who seem to go their own way.

Another sign of consensus-building on city council is the frequency that each member of council has the outcomes of votes in line with how they voted. Again, looking only at non-unanimous votes:


The mayor has been on the losing side of 51 votes out of 610 in which he's been present or not recused, which suggests a reasonable level of consensus building (though not quite as high as Iveson in Edmonton).

If we plot a graph of councillors, and connect them only if they agree at least 67% of the time, we get the following:


The cut-off here was chosen in order to include councillor Turner while still highlighting differences in agreement rates. Unsurprisingly, councillors Turner, Helmer, and Squire are relative outsiders, with a strong cluster of the six councillors mentioned before in the center. Also, this type of graph is incredibly satisfying to play with - enjoy at your own risk!

While showing relative outsiders, this plot doesn't really demonstrate any significant voting blocs. Another way to present the same data is to only connect members of council to whoever they agree with the most often. Doing that results in the following:




Here we get a more interesting structure. Nearly as many people agree more often with councillor Zaifman than Mayor Brown, though there are no separated islands of voting blocs. Only two members of council agreed with each other the most mutually, Matt Brown and Maureen Cassidy, an observation that is provided without further commentary.

The last way I'll look at voting patterns is to scale them using a variant of NOMINATE. This method was developed for analyzing US Congress voting patters, and can assign voting members to a political spectrum without needing to know what the bills being voted on were. For more information, this link is a fascinating read.


Obviously a city council is going to be less partisan than a parliamentary system, but the relative placement of councillors on the graph correlates with how often the agree or disagree with each other, as well as an approximate alignment on issues. I'll detail how this was developed in a subsequent post, but the short version is that each vote is also given a numerical position, and councillors who are closer to the "yes" vote than the "no" vote are assigned probabilities to vote either way. This is then trained against the actual vote data, and thousands of iterations of machine learning later we get this distribution.

Hopefully this has been an interesting glimpse into London city council. Have a fun election!

Friday, June 8, 2018

Ontario Election Wrap-up

The 2018 Ontario General Election is over, and if your team won then congratulations to you!

Over the last month or so I've been tracking the election polls and testing out a few different ideas in order to improve a general model that I'll end up using for the upcoming Alberta election. Of course, I wasn't the only person doing this, and I was able to find at least six other sites tracking and projecting alongside.

But who did the best? Can we learn anything specific about which models produce more reliable results?

First of all, we can look at seat projections. As far as I could tell by mid-day June 7th, this was the seat projection distribution between the seven of us:



CBC Too Close to Call QC125 Lispop Teddy on Politics Calculated Politics Extreme Enginerding Average Actual
PC 78 74 70 69 60 71 70 70.3 76
NDP 45 46 47 50 55 44 45 47.4 40
LIB 1 3 6 4 8 8 9 5.6 7
GRN 0 1 1 1 1 1 0 0.7 1
OTH 0 0 0 0 0 0 0 0 0


Ranking these by the root sum of squares difference from the actual results, we get:

  1. Calculated Politics (diff: 6.48). Their method involved seat-by-seat projections, suggesting a regional breakdown that seemed to work pretty well for them!
  2. Too Close to Call (diff: 7.48). They also provided seat-by-seat projections, and had regional factors involved to project those. Also, most handily, their simulator was interactive, but putting the correct values into it actually made their predictions slightly worse (still second place at 7.87 though).
  3. (Tie: CBC and Me) (diff: 8.12). We ended up with the same predictions for the NDP, but CBC was way under for the Liberals and I was quite a bit under for the PCs. My model didn't involve individual seat projections and instead just approximated historical trends for seat ranges based on party vote share, so that's a win for simplicity I suppose.
  4. QC125 (diff: 9.27). Another site with seat-by-seat projections. The actual seats fell well within their expected ranges, but were all off by a little bit. I'm unsure how they came up with the seat vote projections.
  5. Average (diff: 9.48). In this case, the wisdom of the crowds didn't pan out. 
  6. Lispop (diff: 12.57). Hypothetically they used a regional swing model similar to mine, so I'm not quite sure where the difference comes from here. It looks like they anticipated a much higher NDP voter base than actually happened.
  7. Teddy on Politics (diff: 21.95). It seems like Teddy paid more attention to leader favorability numbers than most of the rest of us, and that seems to have tilted the seat distribution against him. His was the only model to predict a minority government.
For most of the models, the seat projections came directly from the popular vote estimates. If we take a look at those, we get:




CBCToo Close to CallQC125LispopTeddy on PoliticsCalculated PoliticsExtreme EnginerdingAverageActual
PC38.737.937.83837.938.439.838.440.5
NDP35.53636.13736.836.135.935.933.6
LIB19.619.819.71920.919.519.619.719.6
GRN4.94.65?4.54.65.24.84.6
OTH1.31.71.4?01.51.31.41.8

Ranking these again by the same criteria we get:

  1. Me! (diff: 1.15) 
  2. CBC (diff: 2.69)
  3. Average (diff: 3.21) This is a better example of the group as a whole performing better than most individual members. This also probably makes sense as these numbers would have come mostly from the same pool of publicly available polls with a small amount of interpretation for trends and recency, as opposed to a large amount of interpretation as in the case with seat projections.
  4. Calculated Politics (diff: 3.29)
  5. Too Close to Call (diff: 3.56)
  6. QC125 (diff: 3.73)
  7. Lispop (diff: ~4.3) Note that Lispop didn't list their prediction for the green party vote total, despite projecting them to win a seat.
  8. Teddy on Politics (diff: 4.37)
Overall I'm really pleased with how I did, and I've learned a few tricks to use in upcoming elections. Next up will probably be Qu├ębec, hopefully with the same group of people, and we can see if this was a fluke for me or not!

Finally, here's my seat model with the actual results input as though they were one final gigantic poll at the end. Using these correct values would have resulted in the model being the most accurate seat projection of them all (diff: 4.24), which is an encouraging sign that the model itself was sound!


See you next election!

Tuesday, April 24, 2018

Alberta Electoral Districts

A few months ago, the Alberta Electoral Boundaries Commission released its report with recommendations on how to redistrict the province for the 2019 election. As I discussed before, this is an important process that occurs every eight to ten years, and is necessary for keeping the provincial electoral boundaries up to date with current population distributions.

As a quick aside, I'd like to thank everyone who, after reading my post on redistributing using the shortest splitline algorithm, actually wrote in to the commission to tell them to do that. Thanks guys!

Redistricting is always a hot topic, as it can lead to accusations of tampering or gerrymandering by those in power. In Alberta the process is ostensibly done by an arms-length body, and as such when the results were unveiled in October the complaints were pretty tame from the parties not in power. The major effect of the redistricting was to merge rural ridings in such a way that three more urban ridings were created.

In 2015, the poll by poll results for Alberta looked like this:



Here, each poll is shaded a darker colour if the party won by 50% of the vote or more. It's pretty fun to zoom around in it!

These polls were fitted to the 2015 riding boundaries, and if we break them out then add the votes back together according to the new 2019 boundaries, we can get a sense of what the outcome for future elections might look like. The process isn't perfect, as not all polls fit precisely into each new riding, but ultimately this is how the 2015 election is likely to have looked under the 2019 redistricting:




This map is coloured the same way as the poll map above.

The impact this would have had on each party is:

  • The total seats won by the NDP wouldn't have changed at 54
  • The total seats won by the Wildrose would have decreased from 21 to 20
  • The total seats won by the PCs would have increased from 10* to 12
  • The Alberta Party would have stayed at 1 seat
  • The Liberals wouldn't have won any
The next election is a little over a year away, and these will be the ridings to be determined in that election. Stay tuned as I work to better develop my seat projection model and poll tracker over the next year!

Monday, October 23, 2017

Edmonton Election 2017

Another election has come and gone, and apart from a handful of new faces the biggest news is all the new stats! Let's take a look:

First of all, turnout was abysmal. A total of 194,826 people voted, resulting in a voter turnout of 31.5%. The best (blue) and worst (red) areas of the city in terms of voter turnout are shown here:




The colouring of the map is a bit funky since the mean and median are rather far apart, but it gives a decent impression of what happened. In general, it looks like neighborhoods around the river valley voted more often than neighborhoods away from it, which is interesting. The massive difference between the high (66.9%) and low (9.3%) turnout is absolutely astounding to me, and might suggest fairly significant challenges with connecting with voters in certain areas (especially if they can't see the river, apparently...).

Voter turnout can also be measured in a few other ways, including attrition along the ballot. For instance, of everyone who voted, 1.5% neglected to vote for a mayoral candidate, and 1.9% neglected to vote for any council candidate. 26.3% of voters picked a Catholic schools ballot vs. 66.6% Public ballots, and even then 6.5% of Catholics and 9.8% of Publics didn't end up voting for a school trustee anyway. Oddly enough, the total number of Catholic + Public voters doesn't equal the total number of voters, so I'm not entirely sure where the remaining 7.1% of voters did for school board...


Lighter colours represent 'under votes', or people who didn't make a pick for that particular round of voting.
Don Iveson was re-elected mayor with a solid victory. His support levels in Edmonton aren't dissimilar from last election, and are shown here (darker colours meaning higher support).






Iveson's support in general seems very solid in the center of the city, and a bit weaker in the north and southeast than the rest of the city. All that being said, his support ranged from 59.5-85.9% so he has a strong mandate from every part of the city.

Finally, similar to last election, I've taken a look at which councillors' support correlates most or least with the mayor's. Last year, it turned out that a general pattern emerged where the councillors whose support most often correlated with high mayoral support also generally agreed with the mayor on votes. This year, the correlations between councillors and the mayor are:


I'd say this supports the theory from last election - last term, McKeen, Esslinger, Knack, Walters, and Henderson all voted alongside the mayor on more than 80% of non-unanimous votes, while Banga, Caterina, and Nickel (76%, 75%, and 46%, respectively) agreed with the mayor less frequently. While the mayor has had a strong track record of gaining majority support for non-unanimous bills, it does seem as though the candidates who do better in polls where the mayor does worse to tend on average to disagree with him more often than not.

That suggests that perhaps this council will be a little bit closer in voting record than the last one - the four new councillors all showed up in the middle of the pack for mayoral correlations, so likely either they are wildcards for agreement with the mayor, or as new candidates their reputation hasn't yet been tested. Only time will tell!