Wednesday, September 25, 2013

Edmonton Election: Donors

I like stats, and I like elections, so I figured I'd write a bit on some of the statistics coming out of the Edmonton election so far (yes, I know we haven't even had official candidates for two whole days yet but bear with me, this will be fun!).

Under Edmonton election bylaws, candidates are required to make their donor lists publicly available following the results of an election. This year, these stats won't be posted until March 1, 2014, but of course candidates are free to do so whenever they wish.

The information from last election is already available, and is fairly interesting in and of itself. Take, for example, the donation results from the two largest 2010 campaigns, Stephen Mandel and David Dorward. Here's a profile of the donations they received:



For this graph, as you move along to the right with increasing donation values added in, you can see the total sum of all contributions go up, all the way to the maximum allowable donation of $5,000. What you end up getting is a fairly smooth profile until around the $3,000-$4,000 range, where all of a sudden people figure if they're in for a penny they may as well be in for five thousand dollars, and you get a MASSIVE spike at the $5,000 donation end.

Up until the maximum donations, Mandel had almost three times as much money as Dorward, but Dorward ended up bringing in the big guns and amassed $85,000 extra in the $5,000 denominations, bringing their final totals much closer together (but of course, in the end Mandel still beat him by quite a large vote margin...).

These graphs are nice and complete because every single donation is accounted for in the declarations by the candidates. Because of the relatively predictable nature of the graphs, the total donations can be easily approximated by breaking the donation amounts into $1,000 chunks, and multiplying them by the average value of each chunk, like so:

Mandel:

Average DonationDonors$ Expected$ Actual
$100-$1,000$550245134,750115,533
$1,000-$2,000$1,5002233,00033,353
$2,000-$3,000$2,5001230,00032,550
$3,000-$4,000$3,500517,50018,200
$4,000-$5,000$5,000*84420,000420,000
Total
368635,250619,636


Error:2.52%

Dorward:

Average DonationDonors$ Expected$ Actual
$100-$1,000$5505932,45036,320
$1,000-$2,000$1,5001218,00021,500
$2,000-$3,000$2,500410,00010,500
$3,000-$4,000$3,500000
$4,000-$5,000$5,000*101505,000505,000
Total
176565,450573,320


Error:1.37%

*Expected donations in the $4,000-$5,000 category are taken to be $5,000, based on the profile shown before.

Unfortunately, donations that are less than $100 aren't broken down by donor, but Mandel and Dorward received $8,548 and $2,020, respectively. This method of breaking down the donations into categories appears to be very accurate at predicting the total amount candidates received in donations, which is handy because of what you're about to read next!

Two of this year's candidates for Mayor, Don Iveson and Karen Leibovici, have within the last week released some information on their donors. Good for them - they certainly didn't have to do it yet, but it's a nice sign that candidates who pledge to be accountable have already gotten the Ball o' Transparency rolling. Their lists aren't quite as broken down as Mandel's or Dorward's from last election, and instead we are given a list of donors and broad categories that they fit into donation-wise. Again, donations of less than $100 aren't listed.

If we take the number of people in each category and try to back-calculate the expected fundraising values (using the previous methodology), we can do a more in-depth comparison between the two candidates, and maybe get a glimpse at the sort of donations they tend to receive. It might look something like this:

Iveson:

Average DonationDonors$ Expected$ Actual
$10-$100$5523012,65013,893
$100-$500$30012036,000??
$500-$2,000$1,2504050,000??
$2,000-$3,500$2,7501952,250??
$3,500-$5,000$5,000*33165,000??
Total
442315,900318,772


Error:0.90%

It looks like the break-into-categories model for Don Iveson's campaign is surprisingly very accurate. The final sum for the <$100 category was the only one that was given, but even the amount for that category was pretty much in line with what you'd expect.

Leibovici's results are a bit different, though:

Leibovici:

Average DonationDonors$ Expected$ Actual
$25-$100$62.5??????
$100-$1,000$5509451,700??
$1,000-$3,000$2,0002346,000??
$3,000-$5,000$5,000*44220,000??
Total
??317,700365,000


Missing:47,300

The results seem mostly fine, I suppose - at first there's not a lot to really compare it to. What's interesting is that $47,300 figure at the end. I suppose it isn't actually missing, per se, and presumably it mostly belongs to the $25-$100 category (Leibovici's website indicated that $25 was the lowest donation they'd received).

What's fun is that, if it is all from the low-donation category, we'd expect a whopping 756 individual people to have donated in that category, if the same model for handling categories that worked so well for Iveson, Dorward, and Mandel is to work here. This is pretty extreme, to say the least. There are three general explanations for what could be causing this discrepency:

  •  Karen Leibovici has found a lot of small-time donors (who apparently haven't donated in this manner to mayoral candidates before). Perhaps this is the first sign of a truly novel strategy?
  • The donation profile for the Leibovici campaign is absolutely wonky, and consists mostly of  $1,000, $3,000, and $5,000 donations. This seems tremendously unlikely.
  • More likely the quoted figure of $365,000 is either not from the same date as the list, or the list as published is incomplete. The seems plausible since the list was published on September 19th but was titled "September 16th", so perhaps a ~$40,000 or so isn't reported on the list, with $365,000 being all donations as of the 19th.
  • Something nefarious is afoot. (Yes, this is usually my first assumption when a model of mine doesn't accurately predict real life...)
Assuming Leibovici's donors are precisely as they've been presented and follow a similar model as other mayoral candidates, we get this profile:


Again, this graph has less data to work with, so it's far less complete than the results from 2010, but still appears to show that Leibovici gets much more support from large donors (at the $5,000 maximum) than Iveson so far. But with the election only having officially begun this week, none of this really matters, I suppose!

Stay tuned for the next post, where I talk about polling. Yay!

Friday, September 20, 2013

Bieber Fever

Newspapers had a hay-day last year following the publication of a paper out of the University of Ottawa that discussed Bieber Fever. Some articles included:
Of all of these, I'm most disappointed in the CBC - I often pay attention to the CBC, and it's discouraging to know that they might be equally as wrong about other things as they are about this.

What's going on here? A professor from the University of Ottawa published a paper where a new model was developed to look at Bieber Fever, and the paper does indeed include the quote "It follows that Bieber Fever is extremely infectious, even more than measles, which is currently one of the most infectious diseases. Bieber Fever may therefore be the most infectious disease of our time." Oh my god, those newspapers must be right! Science has confirmed our worst fears! This must be backed by hard facts and empirical evidence!

Well, no. The paper appears to be a chapter in a book that examines diseases through various mathematical models. Each of the papers in the book (four of which are written by the Bieber Fever author) takes on a different disease and models it, then examines mathematically the effects of different approaches to the disease, like pulse vaccination or changes in infection or relapse rate. I can't comment on the quality of the other chapters in the book, but they seem to be well-developed and certainly based on real diseases. 

The Bieber Fever paper is a little bit different though. First of all, it is clearly written in a tongue-in-cheek manner that I think flew over the heads of most major newspapers. The humor and sarcasm actually make it quite an entertaining read, and if the piece was written as a humorous look into a creative way of adapting a disease model (which is my suspicion), then it could certainly be a fun case study for biology or math students. It is definitely not something worth raising alarms over in newspapers, though, as the model's predictions aren't validated against any actual statistics and its math is misleading, allowing them to draw this ridiculous comparison to measles that grabbed newspaper attention.

Mathematical disease modelling is a pretty cool field. The most basic model that can be developed is an SIR model - a population is divided up into three groups (Susceptible, Infected, and Removed), and people move through the groups depending on disease parameters and the size of the groups at a given time. For instance, if a lot of people are Infected, the chance of a healthy Susceptible person getting infected is quite high (perhaps due to lots of people sneezing on them), but as more people are Removed (happily by recovery and immunity, or sadly by death), it may become harder for the disease to propagate. 



In this model, βIS represents the rate that healthy people become sick - effectively, it is the chance that in a given time a Susceptible person will encounter an Infected person, multiplied by the chance that that encounter will transmit the disease. On the other end, γI represents the rate at which sick people become healthy, effectively the number of Infected people divided by how long it takes them to get healthy (or die, I suppose).

As long as the rate of people becoming sick (βIS) is larger than the rate people are recovering (γI), then the disease will reach an epidemic of some type - otherwise it will quickly die out. For simple models, the ratio of these rates is known as the Basic Reproduction Number (R0) of a disease, and correlates to the number of new diseases a sick person will cause. This is pretty easy to visualize - if the ratio R0 is bigger than 1, then by the time someone recovers from their illness they’ll have spread it to at least one more person, and the disease will grow. If you're unlikely to make someone else sick when you fall ill, the disease’s R0 will be less than 1, and the disease will go away without much of an outbreak. 

For reference, the flu typically has an R0 of 2-3, HIV is around 2-5, Smallpox is 5-7, and Measles is 12-18. For every person who got Measles, the disease was so infectious and you had it for long enough that you were expected to transmit it to between twelve and eighteen people before you either recover or die.

Frightening stuff. Fortunately, analyses of diseases with these mathematical models shows that as long as a certain proportion of a population is immunized by vaccine, epidemics can be avoided. That proportion needed is (1-1/R0) - so a typical flu needs 60% immunization to prevent outbreak, and measles needs over 90%. If you're still unsure about getting a flu shot, just remember that if a population doesn't hit ~60% immunity, it is very much worse off for those who don't have the vaccine or who are otherwise susceptible.

The Bieber paper develops a more complicated mathematical disease model. It looks something like this:


The author, Robert Smith? (not a typo), proposed a model where media effects have a large impact on the disease. Positive media (P in the picture) can increase the rate at which healthy people become Bieber-infected, and can also make recovered individuals susceptible to re-infection, and Negative media can heal the sick or immunize the susceptible (how miraculous).

Using the numbers that Smith? has in his paper, the spread of Bieber Fever in a typical school of 1,500 students would look something like this:



After about 2 months, the system reaches an equilibrium with about 85% of people being Bieber Fanatics. The paper makes a couple of assumptions: first of all, people are assumed to "grow out" of Bieber Fever after a period of two years. People are also expected to interact with everyone else in the population at least once a month, and have a transmission rate of 1/1500. This means that the average infected person will infect 1 person a month for 24 months, giving Bieber Fever an R0 of 24.

SWEET MOTHER OF GOD IT'S WORSE THAN MEASLES!?!?

Not even a little bit! The transmission rate is absolutely just assumed out of nowhere - no stats, evidence, or explanation given. Similarly, the length of the disease is made up, with the explanation "But let’s be honest, we all know which one it really is, don’t we?" (Smith?, p. 7). Essentially, the authors were given a calculation where they had to assume three numbers and multiply them together, and newspapers are surprised that the answer to the multiplication was high. Even the mechanics of the positive and negative media effects are questionable, though the model they developed could help provide insight into other diseases with relapse mechanisms.

The paper is cute, clever, and provides a mathematical analysis of a convoluted set of differential equations - for all of these things it serves a nice purpose as a tongue-in-cheek entry into a textbook examining mathematical modelling of infectious diseases. But newspapers taking essentially the result of an unfounded set of assumptions out of proportion and reporting them as "Science Confirms!" will always annoy me to no end.

One last thing. This is what a graph would look like if the same school was hit with measles:



Now that's an epidemic - three people sick can infect up to 1,200 in less than a week. Remember this when deciding whether or not to immunize your baby.