Wednesday, December 18, 2013

Why I Love the Henday

When I was choosing where to get my apartment, my major concerns were cost, decent neighborhood, and access to the LRT for work. That was pretty much it, and as a result I ended up in (what I consider to be) a pretty great location down by the Century Park LRT station.

I soon realized that, while LRT access was great for getting downtown and to sports games, being as far south as I was ended up being fairly inconvenient for getting around to the rest of town, and I was using the Henday ring road a lot more than I had been at my old home, even just to get to other places within Edmonton.

So I decided to take a look at just how efficient the road system is in Edmonton, and how much the Henday played a role in my life. First of all, here's a map showing travel times for someone who lives downtown:


Living smack-dab in the middle of the city definitely has its advantages in terms of minimizing driving times (note: this is assuming no traffic, which is fairly unreasonable for a lot of the time downtown...). Pretty much anywhere between the Whitemud and the Yellowhead is accessible within 15 minutes by car, and, 54% of the city's area is accessible within 20 minutes. Sherwood Park freeway and the Whitemud really open the city out to the east, too.

On the other hand, here is what a similar map looks like for me:


Though it's still a comparable net transit time (53% of the city area is still accessible within 20 minutes), the covered area is very different. This is hardly surprising, of course - sticking someone out at the end of a city ought to increase travel times. What's really cool, though, is that it takes less time to get to the exact opposite side of town than it does to get downtown, even though it's twice as far away. You can even see the effect of the Henday around St. Albert, where a thin band of green colouring hugs the highway.

The real benefit of the Henday is revealed when I plot the same map, but instead avoiding the use of the Henday if at all possible:


Yikes. Pretty much the only easily-traveled areas of the city are anything south of or connected to the Whitemud. Now only 39% of the city can be accessed within 20 minutes, with some areas taking up to 45, and the Cameron Heights neighborhood is pretty much completely lost to me, even though it's fairly close (as the Henday was the closest bridge to it).

Taking the Henday can reduce travel times for me by up to 35%. That's why I love the Henday.

Tuesday, November 12, 2013

Iveson's Friends

So right before the Edmonton election results came out last week, I was extremely excited to run a full statistical analysis on them. I was really eager to see if the effects of signs and flyering could be quantified.

Instead, our new mayor broke a record getting elected, and won handily. It's much harder to do an analysis when he won every single poll (if we ignore hospital and special ballots). If we compare his results to Mandel's results in 2010, we can see the took the support bases of Mandel's very neatly, and then made massive gains in the north and east:


In fact, the only areas where Iveson 2013 really seemed to lose relative to Mandel 2010 were in Karen Leibovici's ward, in the southwest of the city just north of the river. No such effect is really noticed in Kerry Diotte's ward.

Voter turnout was, again, disappointing this year. Here's how it compares to last election:


















So if an analysis on what campaign variables are most important in the election is now difficult (because, let's face it, no factors really contributed to success of other candidates), what can we do?

I decided instead to take a look at how each voting subdivision voted. First of all, I looked at voter turnout and compared that to total Iveson support:


Two things to notice here: first of all, there's a very slight upward trend, which is common for election winners (after all, if the voting subdivisions with lots of voters didn't like you, you wouldn't be likely to win). Also, the fact that this trend is only slight is a good indicator that the election wasn't rigged. There hasn't been much suspicion that the election was rigged (as far as I know), but a similar analysis of Russian election data suggests that certain trends in graphs like this can indicate fishy behavior.

Much more fascinating, though, is looking at the correlation between councillor support and Iveson support in each ward. Each ward is composed of between 15-19 voting subdivisions, and it's interesting to see where support for the mayor and councillor line up, and where they don't. Here are two examples:

In Ward 2, there's a decently strong positive correlation between Esslinger's support and Iveson's, while in Ward 7 there's a similarly strong but negative correlation between Caterina and Iveson.

Now, I know that people often get all up in a fuss whenever they hear about a "correlation", and I'd be hesitant to draw conclusions from this if it weren't for how interesting an analysis of the results from 2010 were. Take a look at this:

In fact, if we put the correlation coefficients into a table we get:

Councillor Correlation
Henderson 0.95
Krushell 0.93
Leibovici 0.86
Iveson 0.74
Batty 0.70
Gibbons 0.65
Anderson 0.47
Sohi 0.40
Sloan* -0.45
Diotte -0.64
Loken* -0.64
Caterina* -0.91
*=Voted against Mandel on arena deal. 
Colours are more or less arbitrary.


This is very interesting, seeing as three of the bottom four councillors were the three who ultimately disagreed with the mayor on the arena deal. Essentially what we're seeing here is that the neighborhoods that really liked Mandel didn't like Caterina, and vice versa. The opposite happened with Henderson - wherever Mandel was popular so was he.

So is it plausible that councillors whose support correlates well with the mayor's are more likely to get along with him? Sure. I'd be wary of using it to predict how councillors will vote on major issues, though, as all it really indicates is how voters reacted to election promises.

Here's the full list for 2013 though! If Esslinger and Henderson tend to work well with Iveson, and Nickel and Caterina don't, just remember that math said it first!

CouncillorCorrelation
Esslinger0.74
Henderson0.71
McKeen0.71
Knack0.60
Sohi0.40
Anderson0.20
Loken0.13
Oshry-0.04
Gibbons-0.13
Nickel-0.26
Walters-0.35
Caterina-0.57

Friday, October 11, 2013

Fake It 'til you Make It

As of October 11th, the cumulative twitter mentions of the Edmonton election were as follows:
  • Don Iveson: 44.2%
  • Kerry Diotte: 32.4%
  • Karen Leibovici: 20.9%
  • Josh Semotiuk: 2.4%
  • Kristine Acielo/Gordon Ward: <0.1%
Mark Blevis, who has been tracking twitter mentions, is performing this analysis to check whether twitter mentions are useful in predicting the outcome of an election. Personally I'm not convinced (especially since the theory failed in the recent Nova Scotia elections), but I love the spirit behind tracking political statistics. The theory that social media engagement correlates to voter engagement certainly isn't without merit, though I'm sure many other factors are also important.

On the other hand, here are the current number of twitter followers for each candidate (as of 11:00 am October 11th):
(*: Emphasis added)

If we ignore Gordon Ward for a second, there's a very strong (R2=0.96) correlation between followers and mentions. This makes a great deal of sense, and all evidence points to a certain proportion of followers of each candidate being engaged in the election discussions on twitter.

But what on earth is going on with Gordon Ward?

He's only had twitter since just after nomination day and already has over 6,300 followers. Virtually nobody mentions him on twitter, though, and the majority of his posts have barely any interaction with his followers (very few retweets or favorites, for instance).

Take a quick look at his followers, and you might discover a pattern - for the vast majority of them, their ratio of following to follower is extremely high, many of them over 25:1. Many of Mr. Ward's followers have fewer than 10 tweets and are not from Canada. Scroll through a couple of his followers and you'll see what I mean. Needless to say, these are not the accounts of engaged Edmonton voters.

Don Iveson wasn't wrong when he called an election a "communications exercise" - a candidate's ideas surely aren't worth anything if nobody ever gets to hear them. Unfortunately, this often results in election candidates resorting to attacks or half-baked publicity stunts to gain the spotlight and have attention drawn to them. Voters often simply don't have the time to pour over every candidate's platform in detail, and often will only pay attention to candidates who they perceive to be popular.

Just like any other communications or marketing problem, the ability to create a 'buzz' around yourself is key to winning an election - only once people think you're worth their attention will they care about your ideas. In book sales, for instance, it's not uncommon for new authors to hire firms to buy up thousands of copies of their books in order to briefly appear on best-sellers lists. Showing up on a best-seller list gives the author added credentials, gets the book noticed by consumers, and will likely drive future sales.

Books and elections candidates are remarkably similar inasmuch as they are judged often by their covers (though dissimilar in that people actually enjoy movies based on books). Much like publishers buying up copies of their own books, candidates can use online services to inflate their social media presence.

If you want to boost your twitter account by 5,000 followers, you could pay $30 and get them within a week. Or maybe 25,000 Youtube views for $100 here. Perhaps 500 Facebook likes for $42? Just like seeing a book on a best-seller list makes people more likely to pay attention to it, having a lot of twitter followers or Facebook fans gives the impression that a candidate is credible and popular, and can hypothetically be valuable in kickstarting a successful campaign. If Gordon Ward had used a service like this (hypothetically, of course), he should probably ask for his money back - being followed by an army of 6,000 zombie twitter accounts doesn't seem to have gained him much momentum so far in this election.

After I pointed out the zombie twitter account horde, it was mentioned to me that similar shenanigans may be occurring on Facebook. Take, for example, these two candidates from Ward 11:


For reference, here are some mayoral candidate charts:





The grey lines in the chart represent the sum of all new likes over the previous week. For instance, from the week of September 2-8, Mike Nickel received 453 new likes. Impressive. However, from the week of September 3-9, he received 0. The implication here is that, on or around September 2nd, Mike Nickel all of a sudden got ~450 new likes on his page, and then didn't get any more until the lead-up to the nomination day. The graph for Mujahid Chak is similar, with approximately 580 new likes occurring right at the end of August. On the other hand, the values for mayoral candidates fluctuate a bit around nomination day, but don't show any of the sudden changes or plateaus of these other two candidates.

Now maybe this was a case of incredible luck for both candidates. Surely there's a chance that they aren't cheating and buying Facebook likes. Perhaps Mr. Nickel had a tremendously successful Facebook hangout and convinced a bunch of people to like him all at once, or maybe Mr. Chak only started his page on August 30 to an incredible amount of fanfare (and subsequently was ignored by the general Facebook community, judging by the "people talking about this" metric...).

I would love to give them the benefit of the doubt. Really I would. Except that Facebook Graph search is a powerful tool, and lets you take a glimpse into the fans of Mike Nickel. It's not available for everyone yet, so here are some screenshots from a search of Mike Nickel likers (Mujahid Chak's results are very similar) :

(Click to expand)

Admittedly, the first few pages of results are pretty clean - lots of Edmonton citizens, fairly legitimate-looking profiles, etc. After about page 4, though, the proportion of people from Edmonton drops significantly. Apparently Mr. Nickel is supported by people from California, Buenos Aires, Uruguay, Turkey, Tunisia, and Vietnam. Broad support base indeed - in fact, of the people who like Mike Nickel and listed their location, 89.1% of them were listed as living outside of Edmonton. Mujahid Chak's supporters with listed locations were even worse, with 92.9% living outside Edmonton.

I of course am open to an innocent explanation for the international social media popularity of some of Edmonton's election candidates, and I can of course sympathize with the desire to be noticed during an election. Though there's nothing illegal or necessarily improper about artificially inflating your social media presence, I personally find the practice to be deplorable.

Note: I did a quick check of most high-profile election candidates for this analysis, but not as in-depth (as they seemed fine). If you happen to notice any others with irregularities, please let me know!

Wednesday, September 25, 2013

Edmonton Election: Donors

I like stats, and I like elections, so I figured I'd write a bit on some of the statistics coming out of the Edmonton election so far (yes, I know we haven't even had official candidates for two whole days yet but bear with me, this will be fun!).

Under Edmonton election bylaws, candidates are required to make their donor lists publicly available following the results of an election. This year, these stats won't be posted until March 1, 2014, but of course candidates are free to do so whenever they wish.

The information from last election is already available, and is fairly interesting in and of itself. Take, for example, the donation results from the two largest 2010 campaigns, Stephen Mandel and David Dorward. Here's a profile of the donations they received:



For this graph, as you move along to the right with increasing donation values added in, you can see the total sum of all contributions go up, all the way to the maximum allowable donation of $5,000. What you end up getting is a fairly smooth profile until around the $3,000-$4,000 range, where all of a sudden people figure if they're in for a penny they may as well be in for five thousand dollars, and you get a MASSIVE spike at the $5,000 donation end.

Up until the maximum donations, Mandel had almost three times as much money as Dorward, but Dorward ended up bringing in the big guns and amassed $85,000 extra in the $5,000 denominations, bringing their final totals much closer together (but of course, in the end Mandel still beat him by quite a large vote margin...).

These graphs are nice and complete because every single donation is accounted for in the declarations by the candidates. Because of the relatively predictable nature of the graphs, the total donations can be easily approximated by breaking the donation amounts into $1,000 chunks, and multiplying them by the average value of each chunk, like so:

Mandel:

Average DonationDonors$ Expected$ Actual
$100-$1,000$550245134,750115,533
$1,000-$2,000$1,5002233,00033,353
$2,000-$3,000$2,5001230,00032,550
$3,000-$4,000$3,500517,50018,200
$4,000-$5,000$5,000*84420,000420,000
Total
368635,250619,636


Error:2.52%

Dorward:

Average DonationDonors$ Expected$ Actual
$100-$1,000$5505932,45036,320
$1,000-$2,000$1,5001218,00021,500
$2,000-$3,000$2,500410,00010,500
$3,000-$4,000$3,500000
$4,000-$5,000$5,000*101505,000505,000
Total
176565,450573,320


Error:1.37%

*Expected donations in the $4,000-$5,000 category are taken to be $5,000, based on the profile shown before.

Unfortunately, donations that are less than $100 aren't broken down by donor, but Mandel and Dorward received $8,548 and $2,020, respectively. This method of breaking down the donations into categories appears to be very accurate at predicting the total amount candidates received in donations, which is handy because of what you're about to read next!

Two of this year's candidates for Mayor, Don Iveson and Karen Leibovici, have within the last week released some information on their donors. Good for them - they certainly didn't have to do it yet, but it's a nice sign that candidates who pledge to be accountable have already gotten the Ball o' Transparency rolling. Their lists aren't quite as broken down as Mandel's or Dorward's from last election, and instead we are given a list of donors and broad categories that they fit into donation-wise. Again, donations of less than $100 aren't listed.

If we take the number of people in each category and try to back-calculate the expected fundraising values (using the previous methodology), we can do a more in-depth comparison between the two candidates, and maybe get a glimpse at the sort of donations they tend to receive. It might look something like this:

Iveson:

Average DonationDonors$ Expected$ Actual
$10-$100$5523012,65013,893
$100-$500$30012036,000??
$500-$2,000$1,2504050,000??
$2,000-$3,500$2,7501952,250??
$3,500-$5,000$5,000*33165,000??
Total
442315,900318,772


Error:0.90%

It looks like the break-into-categories model for Don Iveson's campaign is surprisingly very accurate. The final sum for the <$100 category was the only one that was given, but even the amount for that category was pretty much in line with what you'd expect.

Leibovici's results are a bit different, though:

Leibovici:

Average DonationDonors$ Expected$ Actual
$25-$100$62.5??????
$100-$1,000$5509451,700??
$1,000-$3,000$2,0002346,000??
$3,000-$5,000$5,000*44220,000??
Total
??317,700365,000


Missing:47,300

The results seem mostly fine, I suppose - at first there's not a lot to really compare it to. What's interesting is that $47,300 figure at the end. I suppose it isn't actually missing, per se, and presumably it mostly belongs to the $25-$100 category (Leibovici's website indicated that $25 was the lowest donation they'd received).

What's fun is that, if it is all from the low-donation category, we'd expect a whopping 756 individual people to have donated in that category, if the same model for handling categories that worked so well for Iveson, Dorward, and Mandel is to work here. This is pretty extreme, to say the least. There are three general explanations for what could be causing this discrepency:

  •  Karen Leibovici has found a lot of small-time donors (who apparently haven't donated in this manner to mayoral candidates before). Perhaps this is the first sign of a truly novel strategy?
  • The donation profile for the Leibovici campaign is absolutely wonky, and consists mostly of  $1,000, $3,000, and $5,000 donations. This seems tremendously unlikely.
  • More likely the quoted figure of $365,000 is either not from the same date as the list, or the list as published is incomplete. The seems plausible since the list was published on September 19th but was titled "September 16th", so perhaps a ~$40,000 or so isn't reported on the list, with $365,000 being all donations as of the 19th.
  • Something nefarious is afoot. (Yes, this is usually my first assumption when a model of mine doesn't accurately predict real life...)
Assuming Leibovici's donors are precisely as they've been presented and follow a similar model as other mayoral candidates, we get this profile:


Again, this graph has less data to work with, so it's far less complete than the results from 2010, but still appears to show that Leibovici gets much more support from large donors (at the $5,000 maximum) than Iveson so far. But with the election only having officially begun this week, none of this really matters, I suppose!

Stay tuned for the next post, where I talk about polling. Yay!

Friday, September 20, 2013

Bieber Fever

Newspapers had a hay-day last year following the publication of a paper out of the University of Ottawa that discussed Bieber Fever. Some articles included:
Of all of these, I'm most disappointed in the CBC - I often pay attention to the CBC, and it's discouraging to know that they might be equally as wrong about other things as they are about this.

What's going on here? A professor from the University of Ottawa published a paper where a new model was developed to look at Bieber Fever, and the paper does indeed include the quote "It follows that Bieber Fever is extremely infectious, even more than measles, which is currently one of the most infectious diseases. Bieber Fever may therefore be the most infectious disease of our time." Oh my god, those newspapers must be right! Science has confirmed our worst fears! This must be backed by hard facts and empirical evidence!

Well, no. The paper appears to be a chapter in a book that examines diseases through various mathematical models. Each of the papers in the book (four of which are written by the Bieber Fever author) takes on a different disease and models it, then examines mathematically the effects of different approaches to the disease, like pulse vaccination or changes in infection or relapse rate. I can't comment on the quality of the other chapters in the book, but they seem to be well-developed and certainly based on real diseases. 

The Bieber Fever paper is a little bit different though. First of all, it is clearly written in a tongue-in-cheek manner that I think flew over the heads of most major newspapers. The humor and sarcasm actually make it quite an entertaining read, and if the piece was written as a humorous look into a creative way of adapting a disease model (which is my suspicion), then it could certainly be a fun case study for biology or math students. It is definitely not something worth raising alarms over in newspapers, though, as the model's predictions aren't validated against any actual statistics and its math is misleading, allowing them to draw this ridiculous comparison to measles that grabbed newspaper attention.

Mathematical disease modelling is a pretty cool field. The most basic model that can be developed is an SIR model - a population is divided up into three groups (Susceptible, Infected, and Removed), and people move through the groups depending on disease parameters and the size of the groups at a given time. For instance, if a lot of people are Infected, the chance of a healthy Susceptible person getting infected is quite high (perhaps due to lots of people sneezing on them), but as more people are Removed (happily by recovery and immunity, or sadly by death), it may become harder for the disease to propagate. 



In this model, βIS represents the rate that healthy people become sick - effectively, it is the chance that in a given time a Susceptible person will encounter an Infected person, multiplied by the chance that that encounter will transmit the disease. On the other end, γI represents the rate at which sick people become healthy, effectively the number of Infected people divided by how long it takes them to get healthy (or die, I suppose).

As long as the rate of people becoming sick (βIS) is larger than the rate people are recovering (γI), then the disease will reach an epidemic of some type - otherwise it will quickly die out. For simple models, the ratio of these rates is known as the Basic Reproduction Number (R0) of a disease, and correlates to the number of new diseases a sick person will cause. This is pretty easy to visualize - if the ratio R0 is bigger than 1, then by the time someone recovers from their illness they’ll have spread it to at least one more person, and the disease will grow. If you're unlikely to make someone else sick when you fall ill, the disease’s R0 will be less than 1, and the disease will go away without much of an outbreak. 

For reference, the flu typically has an R0 of 2-3, HIV is around 2-5, Smallpox is 5-7, and Measles is 12-18. For every person who got Measles, the disease was so infectious and you had it for long enough that you were expected to transmit it to between twelve and eighteen people before you either recover or die.

Frightening stuff. Fortunately, analyses of diseases with these mathematical models shows that as long as a certain proportion of a population is immunized by vaccine, epidemics can be avoided. That proportion needed is (1-1/R0) - so a typical flu needs 60% immunization to prevent outbreak, and measles needs over 90%. If you're still unsure about getting a flu shot, just remember that if a population doesn't hit ~60% immunity, it is very much worse off for those who don't have the vaccine or who are otherwise susceptible.

The Bieber paper develops a more complicated mathematical disease model. It looks something like this:


The author, Robert Smith? (not a typo), proposed a model where media effects have a large impact on the disease. Positive media (P in the picture) can increase the rate at which healthy people become Bieber-infected, and can also make recovered individuals susceptible to re-infection, and Negative media can heal the sick or immunize the susceptible (how miraculous).

Using the numbers that Smith? has in his paper, the spread of Bieber Fever in a typical school of 1,500 students would look something like this:



After about 2 months, the system reaches an equilibrium with about 85% of people being Bieber Fanatics. The paper makes a couple of assumptions: first of all, people are assumed to "grow out" of Bieber Fever after a period of two years. People are also expected to interact with everyone else in the population at least once a month, and have a transmission rate of 1/1500. This means that the average infected person will infect 1 person a month for 24 months, giving Bieber Fever an R0 of 24.

SWEET MOTHER OF GOD IT'S WORSE THAN MEASLES!?!?

Not even a little bit! The transmission rate is absolutely just assumed out of nowhere - no stats, evidence, or explanation given. Similarly, the length of the disease is made up, with the explanation "But let’s be honest, we all know which one it really is, don’t we?" (Smith?, p. 7). Essentially, the authors were given a calculation where they had to assume three numbers and multiply them together, and newspapers are surprised that the answer to the multiplication was high. Even the mechanics of the positive and negative media effects are questionable, though the model they developed could help provide insight into other diseases with relapse mechanisms.

The paper is cute, clever, and provides a mathematical analysis of a convoluted set of differential equations - for all of these things it serves a nice purpose as a tongue-in-cheek entry into a textbook examining mathematical modelling of infectious diseases. But newspapers taking essentially the result of an unfounded set of assumptions out of proportion and reporting them as "Science Confirms!" will always annoy me to no end.

One last thing. This is what a graph would look like if the same school was hit with measles:



Now that's an epidemic - three people sick can infect up to 1,200 in less than a week. Remember this when deciding whether or not to immunize your baby.

Friday, August 30, 2013

Extreme Rock, Paper, Scissors

Few games have captured the hearts and minds of the public as Rock, Paper, Scissors.

Ok, that's maybe not even remotely true. RPS is one of the most basic games on earth that, barring any possible psychological factors, is essentially a glorified coin toss. That hasn't stopped fantastic organizations like the World RPS Society from existing (their 'responsibility code' includes a recommendation not to use Rock Paper Scissors for life-threatening decisions. Good call.), and some people certainly take the game very seriously.

I'm also going to take it very seriously, albeit in a completely different direction. I came across a post on my new favorite blog last week, DataGenetics, where the author created and examined a sort of iterative 'team-based' game of RPS. In his game, people in a group would be assigned to always play a certain play, and then are drawn into random pairings for a show-down. The winners go back into the pool, and the losers get eliminated, until only one 'team' of players remains.

This becomes an interesting examination that, in a way, can sort of approximate population dynamics. If we had an island where wild rocks, papers, and scissors ran around in their natural habitat, where the natural prey of rocks were scissors (they'd viciously bend the blades before eating them), the natural prey of scissors were paper (the blades would tear into the paper velociraptor-style) , and the natural prey of paper was rocks (who would get... covered? By far the lamest of the RPS triad), then if the populations were fairly equal the island would stay at a fairly steady equilibrium. As soon as you remove, say, scissors, the rock population would be catastrophically destroyed by all the unchecked paper roaming around... covering them.

Alright, so it isn't a perfect analogy, but I hope you get the point. As this is a non-transitive food cycle (instead of a mostly transitive food chain), the dynamics are a little bit different than what you might expect in real life, but population dynamics in simple predator-prey models really are rather fascinating.

DataGenetics' results were really cool, though, so I decided to see if I could reproduce them and take them further. The first model he made featured 10 rock players, 10 paper players, and a varying number of  scissors players. What do you think would happen as the population of starting scissors players decreases? Surprisingly, their odds of winning actually increase. In fact, as long as there is at least one scissors player, their odds of winning the whole thing range from 33% to 60%, with a peak when the starting scenario is 10 rock, 10 paper, and 4 scissors.



My results came from running 10,000 game simulations per data point, and almost perfectly match up with the results from DataGenetics, so I'm reasonably confident in them.

What's going on here is actually really cool. Reducing the number of scissors at the outset means that initial pairings between players are going to more likely be between rocks and paper - which paper will (illogically by laying on the rock) win. As the game progresses, it is more likely to become mostly paper and scissors, which is an easy scenario for the scissors to win. This means that unless scissors get unlucky and have a lot of early pairings against rocks, they have much better than even odds of winning the game outright. As DataGenetics put it, this is a great example of the expression "The enemy of my enemy is my friend."

Here's a cool alternative view:


If you run a similar scenario, but with 50 rocks, 50 papers, and a variable number of scissors, the results are even more extreme - scissors' best chance of winning is when they start with 17 players against their opponents' 50 each, where they have a 91% chance of winning the whole game.


The reversal of the trend between rocks and scissors here is also pretty fascinating. At larger numbers of initial players, paper's odds of winning are much more sensitive to changes in the number of scissors than rock's, until the number of scissors becomes drastically low. It's also worth pointing out that my results here deviate a bit form DataGenetics after 25 scissors players, even though I was trying to do fundamentally the same thing as he was. I have no idea who's correct, so I guess it's time for a nerd show-down...

I said I was going to take this a step further, and the nerdiest way of taking Rock, Paper, Scissors further is to change it to Rock, Paper, Scissors, Lizard, Spock.

 
Good thing the rules to normal RPS are easy, because adding two new options adds seven new combinations to remember. As the Big Bang Theory puts it:
 
  • Scissors cut paper,
  • Paper covers rock,
  • Rock crushes lizard,
  • Lizard poisons Spock,
  • Spock smashes scissors,
  • Scissors decapitates lizard,
  • Lizard eats paper,
  • Paper disproves Spock,
  • Spock vaporizes rock, and (as it always has)
  • Rock breaks scissors


  • So what if we ran the same game experiment, but with 10 each of the five combinations? If we kept scissors as the changing variable to be consistent, we get the following:



    Now this is just cool. Instead of scissors getting a bonus by losing players from the start, scissors are barely affected at all until their numbers get small enough. While the starting number of scissors is between 4-10, they're still doing about average with 20% chance of winning, after which their chances plummet.

    What's far more fascinating is the other players - poor Spock gets mutilated! Spock's chances of winning are always less than scissors' until scissors has 0 starting players. Why?

    Most likely it's because, even though Spock smashes scissors, scissors is the only player that can kill both of Spock's predators (lizard and paper). As the chances of both of these getting killed decrease, so too does Spock's chances of winning the game. Meanwhile, lizard is having a great time. Its one big predator, scissors, is dwindling in numbers, meaning it more likely will have to face paper or Spock, which it's of course fine with.

    If we bump up the number of starting players to 50 again, we see the true dominance of rock, and again scissors tends to suffer very little (in fact, they're second-most likely to win up until they start off with only 20).

     
    Very cool indeed, especially if you're rock. Again, poor Spock gets decimated, but in general the behaviors of the other players are largely similar to the previous example, but at a much reduced scale to give way to rock.

    In reality there's virtually no practical application to any of this, except to perhaps point out the unanticipated consequences that may arise when you remove an element of a balanced ecosystem. Population dynamics in the wild certainly don't follow such simple rules, but it's definitely not unheard-of for the addition or removal of a small part of the population of a species to have massive ramifications on other species, and the fact that this can be modeled with math and Rock, Paper, Scissors is pretty cool.

    Thursday, August 15, 2013

    Barenaked Lady Odds

    I'm a fan of (the) Barenaked Ladies. Please feel free to interpret that in any context you wish.

    The band's most recent single, Odds Are, is respectably catchy (alright, I'll admit, I had it on repeat for a couple plays before I decided to write this). The lyrics are also moderately clever, and talk about some of my favorite things (like odds and probability. Yay!).

    Here's a look through some of the lyrics they toss out, and the actual stats behind them:


    Struck by lightning, sounds pretty frightening/ But you know the chances are so small

    The United States of America records an average of 22.8 million cloud-to-ground lightning strikes a year, or approximately one lightning bolt per 14 ‘muricans. Out of all of this lightning, only an average of 34.9 fatalities occur per year. This puts any given American’s odds at getting killed by lightning in a given year at around 0.000011% (much higher if you live in the south and/or play golf, though, so watch out). In fact, you would have to live for about 9,000 years before you’d even have a 0.1% chance of getting hit by lightning. Considering how easy it is to avoid as well (don’t stand near tall things during a thunderstorm), this really isn't something to be all that paranoid about.


    Stuck by a bee sting, nothing but a bee thing/ Better chance you’re gonna buy it at the mall

    Surprisingly, deaths by bee stings are about 70% more common than death by lightning (0.000019% a year). Bee stings are apparently safe enough that the average human can handle 10 stings per pound of bodyweight, meaning that unless you’re deathly allergic you could easily take on over a thousand of the little critters. I guess if you are allergic it might be time to invest in an epi-pen? As for mall deaths, it turns out that the stats on those are pretty hard to find, though I’m sure it’s not terrifically different than the going death rate from walking. I’d recommend avoiding Black Friday in the US, though, since that leads to outright murders…



    But it’s a twenty-three or four-to-one/ That you can fall in love by the end of this song

    This line is confusing. They’re either saying that the odds are 23:1 (95.8%) or 4:1 (80%) that you, the listener, can fall in love by the end of the song. I've listened to it without changing my love status enough times just to catch the lyrics to say that I'm sure, statistically speaking, that they’re wrong.
    Surveys have shown that the average couple will drop the L-word after 14 dates, and that on average they’ll manage two dates a week. If this song was on the upper end of popular and played, say, 3 times a day on a radio station, you could reasonably hear it 150 times between meeting someone and falling in love with them (meaning 99.3% of the time you hear the song you wouldn't have fallen in love). Then again, the odds of the song playing the instant that you fall in love would only be 0.63% on a given day (at three randomly spaced plays of 3:01 each). This is on the upper end, of course – only 80% of people say they fall in love at some point in their 20s, and 33% of people settle down with their first love, so realistically the numbers are way lower than that. There’s no way I’d take 23:1 odds that you’ll fall in love by the end of the song.


    Hit by the A-Train, crashed in an airplane/ I wouldn't recommend either one



    I wouldn't recommend either one either. Apparently in 2011 146 New Yorkers were hit by a Subway (we can pretend it was the A-line), with 47 of them dying. If you live in New York, your odds of Subway-ing to death are 0.00057% (about 51 times higher than getting hit by lightning – watch out!).


    Your chances of dying on a single airline flight on a major world airline are 0.000021%. Of course, most trips are round trips, but even if you were to take two round-trips a year your risk would only be 0.000085%. And that’s just with the average of all major airlines – the top half of airlines are four times less risky per flight, leading to the fun conclusion that taking two round-trip flights a year has pretty much exactly the same risk of death as from bees.


    Killed by a Great White or a meteorite/ I guess there ain't no way to go that’s fun

    Fatal shark attacks are actually ridiculously uncommon – in the United States the chance of getting killed by a shark is less than 0.0000004%. In fact, people in New York are bitten 10 times more each year by other people than by sharks. This is probably not something to worry about.

    Nobody has ever died by a meteorite. That isn't to say that they can’t be dangerous (that Russian one certainly injured a lot of people), but with only approximately 500 meteorites hitting the earth every year, it isn't likely something to ever worry about either. Though if it happened, it wouldn't be fun – BNL is right on that count.


    Odds are we gonna be alright for another night


    On average, 150,000 people die every day. This puts your odds of surviving until the end of the song at around 99.99999996% (much higher than falling in love by the end of the song). The odds of you being “alright for another night” (here I assume "alright" means “still alive”) are about 99.9979% on average.

    However, approximately two thirds of those daily deaths are age-related, and in industrialized countries the proportion of age-related deaths is up to 90%. If we factor that in, and you’re part of the young and hip demographic this blog strives to cater to, your odds of surviving another night are about 99.9998%.

    Summary:

    Odds of you dying by any cause mentioned in the song this year: ~0.0006%*
    Odds of you being alright for another night: ~99.9998%

    *Only if you live in New York

    Thursday, August 8, 2013

    Credit Card Math

    Credit cards are an important part of our society [citation needed], and come with an almost unlimited variety of options and perks between them. Some cards have low interest rates, others give you fancy rewards; some cards require you to have a high income, but others are free. The choices really are nearly limitless.

    One of my favorite perks offered by credit cards is the ability to get cash back as a fraction of the amount you spend using the card (most likely this is actually a clever ploy to get people to put lots of money on their cards in order to run up large amounts of interest, but we'll ignore that for now). Getting straight cash can be easier to deal with than travel rewards unless you happen to use the same Miles system, and provides you with a lovely sum of money to play with usually once a year.

    But not all cards are created equal. If you take a quick look at the terms and conditions of some of the major cash-back cards that are offered, you might get a graph that looks something like this (please note: Scotiabank's cards offered different rates for different purchases, such as groceries, gas, etc. As a result, they weren't included as it made direct unambiguous comparisons impossible.):



    Whoa. My mistake. I set the axes equal to each other, and it almost made it look like the cash you'd get back was very nearly insignificant in the scheme of things. Silly me, that can't possibly be true...



    Ok, whew, that's much better. A few things to note right off the bat:


    • The graph includes annual fees, which explains why some cards start off negative
    • The analysis assumes that the only important differences between cards are the cash back rewards and annual fees, and ignores differences in interest rates, maximum credit limits, minimum income requirements, etc.
    • At some point in the future, CIBC's thesaurus is likely to run out of cool sounding words to add on to credit card names
    The clear winner here is the Capital One card, which provides a free 1.0% cash back system which can be withdrawn from at any time, plus an additional 50% of the accrued cash back once a year (for an equivalent 1.5% cash back system). If we ignore it, though, there is actually a very neat transition between the different cards based on how much you are likely to spend each year:

    If you spend:

    $0-7,500: Get the TD Cash Back MasterCard. No annual fee and 0.75% cash back is a pretty decent deal, all in all. Take it and run! This is way better than its cousin, the TD Gold Elite Visa, which offers 1% but has a $79 annual fee. This range works out to credit card bills of less $625/month, and if you feel you pay more than that on average, why not try...

    $7,500-24,000: The CIBC Dividend Visa. This one also has no annual fee, but its pay structure is tiered (0.25% up to the first $1,500, 0.50% for the next $1,500, and 1% after that), so it takes a while to catch up to the TD card from before. This card is the most profitable for quite a range, though! In the interest of fairness, it is worth mentioning that the RBC Visa Cash Back card is very close and works out to only $0.25 less cash back a year over this range, as the card is a constant 1% reward but cost a $19 annual fee. Its rewards, however, are capped after $25,000 spending per year, so if you're unsure of your spending you're still better to stick with the CIBC card. And if you're going outside of that range anyway, you may as well consider...

    $24,000-35,300: The BMO CashBack World MasterCard. (Note how the card names get longer at higher price ranges?) This card offers an aggressive 1.25% cash back, but was offset by a $79 annual fee. If you feel like even that isn't enough to put on a card, go for either...

    $35,300-50,000: The CIBC Dividend Infinite or Dividend Unlimited World Elite Visa. (Seriously how ridiculous are these names?). These cards are both tied over this range, and both cards have identical tiered reward schemes and $79 annual fees. Unfortunately, the Dividend Infinite isn't all that infinite, and its rewards are capped after $50,000 of spending, at which point...

    $50,000-94,000: The Unlimited World Elite takes over solo in all its shining glory, until...

    $94,000 and beyond: The BMO CashBack catches right back up. It's able to do this because its rewards run at a higher rate than the highest tier of the Unlimited World Elite. After this, there's really no stopping it (apart from, of course, the aforementioned Capital One card, which is laughing at all these other cards from the finish line). 

    Really, though, if you're making enough money to be able to put $94,000 on a single credit card every year, you probably don't particularly care about which card offers you a handful of dollars more than another one.

    Tuesday, July 30, 2013

    In Defense of Fluoride

    Recently, the #yegvote twitter feed has been occupied with a lot of discussion about water fluoridation, particularly from and against mayoral candidate Curtis Penner (including him tweeting a bibliography of 70 consecutive journal articles early Thursday morning). As has been pointed out, a lot of the twitter debate has devolved into personal attacks on both sides, but the question of water fluoridation is important and worth discussing on its merits alone.

    First of all, some fun fluoridation facts!
    • low concentrations of fluoride in your mouth reduces the rate at which your enamel breaks down,
    • fluoride is often naturally present in water all over the world in different concentrations,
    • about 5% of the world's population has fluoride added to the water supply at low concentrations (including Edmonton), and
    • different studies have shown that the presence of fluoride in water can reduce cavities by between 27 and 40% relative to regular brushing

    The ideal concentration for fluoride in water appears to be somewhere between 0.5-1.0 milligrams per liter. This is lower than the natural levels found in lots of communities, and in many parts of the world fluoride concentrations are reduced or even eliminated before being pumped into municipal water supplies.
     
    Mr. Penner's platform for mayor includes a lengthy paragraph against water fluoridation, with the first argument referencing the Material Safety Data Sheet for the chemical that's being used in Edmonton, hydrofluosilicic acid. His claim appears to be that since this chemical is listed as corrosive and dangerously reactive, we shouldn't have anything to do with it. This is at best a red herring, and not an argument at all - at the high concentrations that the chemical is stored, very nearly anything is poisonous. We add chlorine to our water supply explicitly to kill living organisms, and its MSDS warnings are even more severe than fluoride. This isn't to say that the concentrated version of the chemical is ok to drink, but the final product in our taps is millions of times less concentrated than the solution the data sheet refers to.
     
     Dropping evil-sounding chemical names and referring to alarming MSDS data sheets can't form an argument in and ofitself. People regularly consume citric acid and acetic acid, whose MSDS data sheets name them as flammable and corrosive and include pages of toxicity warnings, but we enjoy them as orange juice and vinegar. The fact of the matter is that data sheets have to cover all possibilities and naturally make anything sound evil - even the data sheet for plain old boring water has lethal dosage information.
     
    Mr. Penner then goes on to reference this Harvard meta-analysis on the effects of fluoride on children. His claim is that the study indicates that "children who do not drink fluoride have a 20% better chance of having high intelligence, whereas those who do drink fluoride have a 9% better chance of developing mental retardation." Oddly enough, the words "mental retardation" and the term "20%" don't show up in the journal article at all. In fact, they write instead that their "results support the possibility of adverse effects of fluoride exposures on children's neurodevelopment." Their major finding, actually, is that children who had "high exposure" to fluoride had an IQ that was0.45 points lower than reference children.
     
    What constituted "high exposure"? Turns out it ranged from about 3-12 mg/L for most studies. Their reference points - the points the study considered not exposed to fluoride - were between 0.34-2.35 mg/L. In fact, some of the reference healthy populations were drinking water that was two to three timesmore fluoridated than what would ever be allowed in Edmonton's water. An overwhelming majority of the over 70 studies tweeted by Mr. Penner that supposedly support his position deal with high concentrations of naturally-occurring fluoride in India or China, not low concentrations carefully monitored in Canada. So while it most likely is true that high concentrations of fluoride can cause adverse effects, it is far more likely that increasing your intake of any substance by a factor of 5 to 10 over what scientists recommend would be similarly poisonous.
     
    It has also been pointed out that Calgary stopped water fluoridation in 2011. This is true - and already has dentists raising alarms about increases in cavities (and don't forget, dentists get less work if people have fewer cavities. They must be really concerned...).
     
    With studies showing the positive effects of low concentrations of fluoride, and other studies showing adverse effects only at levels significantly higher than Edmonton's, the scientific argument for those opposing the current fluoride program doesn't seem that strong. The remaining argument is one of policy - is it morally acceptable to add a substance to the water with the goal of treating an entire population?
     
    While this is mostly a matter of personal opinion, it is interesting to consider the city’s role and responsibilities in providing water to the public. There is a very defined cost vs. benefit analysis to be made when treating water. No treatment and you'll kill people, add some filters and you'll kill a bunch less - but after a certain point there are massively diminishing returns with regards to how expensive it will be to make water just a little bit safer to drink, with the extreme end being an economically devastating plan to provide everyone with pure distilled water (which, actually, maybe isn't all that ideal after all).
     
    If the factors behind water treatment were only economical in nature, then the benefits of saving families hundreds of dollars a year for dental work far outweigh the cost of fluoride at less thana dollar per personFluoridation provides a blanket benefit to everyone who drinks tap water, providing unique aid to those who cannot afford regular dental work or perhaps even other sources of fluoride like toothpaste.
     
    Water fluoridation has been included with vaccinations and family planning as among the ten greatest public health achievements by the Centers for Disease Control and Prevention, and is supported by major health and dental organizationsAs mentioned before, the up to 40% decrease in cavities from fluoridated water is in addition to any benefit already achieved from brushing your teeth. Any policy that so easily and effectively provides this much assistance against such a preventable condition as cavities is surely worth keeping around.
     
    As long as the fluoride program that Edmonton currently uses continues to be beneficial, and the low concentrations that are used continue to not be dangerous, the program should not be stopped.

    Wednesday, June 26, 2013

    NHL 2013 Wrap-up

    So the Stanley Cup has been awarded. Congrats to the Chicago Blackhawks!

    As I previously mentioned, I've been running a model to try to predict the NHL finals for a few years, and I've been running it again this year. The model outputs looked like this over the last few weeks:

    Again, as the time through the playoffs progresses (x-axis), the probability of a given team winning is displayed as the height of the team's bar (so that at any time all teams' probabilities add up to 100%). Originally I had predicted the Senators had the highest chance of winning, with the Bruins in second and the Blackhawks in third. Fortunately enough, two out of those top three actually made it somewhere...

    What's important though is to check if I'm anywhere near accurate. It's convenient, I suppose, to say that my top teams all did fairly well, but how well did I actually do? Take a look at this (format stolen straight-up from Wikipedia):

    Conference Quarterfinals Conference Semifinals Conference Finals Stanley Cup Finals
    1 Pittsburgh Penguins (64.2%) 4 1 Pittsburgh Penguins (22.8%) 4
    8 New York Islanders (35.8%) 2 7 Ottawa Senators (77.2%) 1
    2 Montreal Canadiens (15.9%) 1 Eastern Conference
    7 Ottawa Senators (84.1%) 4
    1 Pittsburgh Penguins (31.7%) 0
    4 Boston Bruins (68.3%) 4
    3 Washington Capitals (37.9%) 3
    6 New York Rangers (62.1%) 4
    4 Boston Bruins (76.7%) 4 4 Boston Bruins (59.8%) 4
    5 Toronto Maple Leafs (23.3%) 3 6 New York Rangers (40.2%) 1
    E4 Boston Bruins (52.4%) 2
    (Pairings are re-seeded after the first round.)
    W1 Chicago Blackhawks (47.6%) 4
    1 Chicago Blackhawks (77.0%) 4 1 Chicago Blackhawks (58.2%) 4
    8 Minnesota Wild (23.0%) 1 7 Detroit Red Wings (41.8%) 3
    2 Anaheim Ducks (40.9%) 3
    7 Detroit Red Wings (59.1%) 4
    1 Chicago Blackhawks (70.3%) 4
    5 Los Angeles Kings (29.7%) 1
    3 Vancouver Canucks (37.8%) 0
    6 San Jose Sharks (62.2%) 4 Western Conference
    St. Louis Blues (43.1%) 2 5 Los Angeles Kings (34.3%) 4
    5 Los Angeles Kings (56.9%) 4 6 San Jose Sharks (65.7%) 3
    In each bracket, the percentages are the probabilities assigned by my model. A first glance analysis of this bracket shows that in 12/15 of the series, the teams I assigned the highest odds of winning to ended up winning (exceptions are Kings v. Sharks, Penguins v. Senators, and awkwardly Bruins v. Blackhawks).

    If someone was flipping a coin, they would expect to predict 12/15 or better only about 1.76% of the time. This percentage is known as a p-value, and a standard convention in, say, medical experiments, is to use a value of 5% for determining if the results are significant or just arose as a matter of chance. Regardless of if this is a reasonable standard or not (this could allow 5% of the results of studies to be due to chance, and is a debate worth having), a "conventional" medical study with a p-value of 1.76% would likely be accepted, leading me to humbly suggest that my model this year was statistically significantly above chance when it comes to predicting the outcomes of series (though not necessarily the playoffs as a whole).

    Another way of measuring the accuracy is by using a Brier score (similar to what I had in my previous post) to actually measure the accuracy of the probabilities assigned. It's all well and good to say the teams with the best odds often won, but how good were the odds that I assigned?

    The average Brier score for all 15 series was 0.8182 out of 1, where a 50/50 guess for any given series would give a value of 0.75. My results are basically the equivalent of assigning a value of 57% at the outset to every team that actually ends up winning (as opposed to a random value of 50%) - in other words perhaps not definitively accurate and clairvoyant, but still likely significant.

    One last thing to try is to plot the overall Brier score over the entirety of the playoffs, and compare that to random chance. It would look like this:


    That's actually not so good. Compared to last year, there's virtually no difference between the two. At least in the parts where the two are different (mid-May, for instance), the model is higher than chance, but everywhere else is about even. The similarity between the two suggests that there weren't a lot of cases where series ended in remarkable against-the-odds comebacks, though when one did (Blackhawks v. Red Wings, for instance) it was able to be better predicted by my model than chance.

    What's encouraging about all of this is that between all of the different ways of analyzing the accuracy, they converge on suggesting that the parameters that my model uses are reasonable and significant. Also, if I had bet on the first round, I would have made a killing. Too bad...