Extreme Enginerding: Science

Showing posts with label Science. Show all posts

Monday, May 11, 2015

Reuniting the Alberta Right

After last week's Alberta election, several of Alberta's political pundits expressed frustration that the splitting of the vote on the right may have allowed for the NDP success that we saw on election night. Danielle Smith, for instance, said:

Combined PC-WRP vote is 52%. NDP vote is 42%. #justsaying
— Danielle Smith (@ABDanielleSmith) May 6, 2015

She has a bit of a point - despite all the hype of the NDP surge during the campaign, they did still manage to get a strong majority government with less than half of the popular vote, and the combined popular vote of the two 'right-of-centre' parties could easily have beaten them.

Overall, the Wildrose Party ended up with far more seats than the PCs, even though they got 53,000 fewer votes (all this sounds like a set-up for a discussion on proportional voting systems, but I'll save that for later). Though the PC dynasty is ended for now, they certainly aren't lacking in a core voter base, and I wouldn't say they're definitely out of the game just yet.

But to those who are lamenting the splitting of the right side of the political spectrum, what's the most efficient way to reunite these two parties? If the right is to take control again, would it be easier to have the PC supporters move over to the Wildrose, or vice versa?

Let's check. I looked at the results for each riding from last week's election, and checked what the results would have been for each seat if a certain percentage of PC support moved to the Wildrose, or vice versa. First of all, let's see what happens if we increase the amount of PC voters who move over to the Wildrose:

What this is telling us is that if 23.1% of PC supporters in each riding had instead voted Wildrose, there would have been enough to completely eliminate the PC presence in the legislature. If 35.8% of PC supporters had moved to the Wildrose, it would have been enough to take seats from the NDP and result in a majority of seats. A full reunification of the right would have resulted in 59 total seats, with 26 remaining for the NDP. In both cases, the seats won by the Liberal and Alberta Party MLAs were higher than the combined PC/Wildrose vote, so they're considered immune to this reunification effort.

On the other hand, it would have taken 30.3% of Wildrose supporters flocking back to the PCs in order to result in no Wildrose MLAs elected, and a 31.4% defection rate in order for the right to take control of a majority government.

Which one of these scenarios is most likely is a more nuanced question. Because of how poorly distributed the PC vote was between ridings, it's much easier for the Wildrose to absorb all of the PC seats (23.1% of PC support is only 95,393 voters across the province, for instance) than it is for the PC to absorb the Wildrose seats. If the goal is to reunite the right and regain control of the legislature, though, it may still be easier for the PCs to try to woo Wildrose voters - 31.4% of the Wildrose support is only 113,072 voters, and would have gotten the right back in power.

Overall, this means that a swing one way or another of about 100,000 right-leaning voters could have made all the difference in stopping the NDP from getting elected. Considering that this represents less than 8% of all voters from the last election, the possibility of a resurgence of the Alberta right is certainly not out of the question. The NDP has four years in power now to make good on their promises from the last election and retain their support, otherwise they may be in a bit of trouble during the next election.

Tuesday, September 2, 2014

Edmonton's Census Correlations

Back in May, a lovely website went viral that listed a number of spurious correlations between unrelated sets of data. It was loads of fun to read, and a lovely reminder that correlation doesn't imply causation.

Edmonton's 2014 census data was released last week, in a glorious Christmas-like occasion for people like me who are into that sort of thing. The census asked a couple fun questions and broke the results down by neighborhood, and I originally figured it might be a fun idea to comb through the data for ridiculous correlations like the Spurious Correlations website.

Unfortunately nothing super ridiculous stood out. Regardless, take a look at some of the more fun findings from the Census that maybe haven't been picked up on by other sources:

Married people don't like renting

I mean, really, nobody really likes renting, but it seems like married people especially don't.

Low apartments make you lazier

In general, living in an apartment correlates with transit alternatives that aren't driving, but people in high-rise apartments walk to work way more than people in shorter apartments. Sure, this is maybe because most of the people who walk to work live downtown and that's where the high-rises are, but it's more fun to think that short apartments compel people to bus...

This fun graph

Basically, as neighborhood populations change, people's jobs change too. For instance, the most common time to have a family member in preschool is when you have people in your house under age 5 (duh), but the second most common is when you have people aged 35-40. That double-peak pattern gets shifted over by 10 years and flattened out for grade 7 kids.

Other moderately interesting (but less pretty to graph) correlations include:

Full-time workers like driving their own cars, but only really post-secondary students bother consistently taking transit to work
People who've been in their house a long time tend to pay attention to the newspaper and radio more for their city info, but people who've been there for less than 3 years seem to prefer the city website
People who go to Catholic school seem to like driving more
People working part-time are more likely to have lived in their house for more than 5 years than people working full-time (but less likely than if there are high school kids in the house!)
25 to 40 year olds tend to move around the most, after then they seem to stick in the same house for a while

Friday, August 15, 2014

The Great Oven Mitt Review of 2014

So I've got some cool friends.

A little background: when I first moved into my apartment (one year ago today exactly!), I only remembered to buy oven mitts at the last minute. I grabbed the quickest and cheapest one I could, which was a single oven glove appropriately called the "Ove' Glove." The night I moved in, we made pizzas and the Ove' Glove was the source of much entertainment and complaints, as it appeared to be a very good conductor of heat instead of an insulator. Whoops.

To remedy this, a two weeks ago at my birthday party my friend Cassandra decided it would be a great gag gift idea for everyone to bring oven mitts for me. As a result, I am now in possession of 16 oven mitts, ranging in colours, sizes, and materials.

So on this, the anniversary of me moving in, I've decided to work with these oven mitts the way that I know best: test them and write a report.

My set-up was pretty basic - I made a system to hold the oven mitt a fixed distance away from a medium-heat element, stuck a meat thermometer inside, and took heat measurements for up to ten minutes. A check without oven mitts showed that this setup subjected the oven mitts to a temperature of approximately 70 degrees.

Test Setup (highly technical)

So without further ado, I present to you my rankings for oven mitts from worst to best. Starting with:

#9 Blue Mitts of Death

Brand: Dollar Store
Value: $3

I suppose at this point it's worth pointing out exactly how I'm ranking them. First and foremost, I'm looking at how long it takes the mitts to actually burn you. According to this source, 55 degrees is hot enough to give second-degree burns after 17 seconds. Holding onto a 70 degree heat source, this means you'd burn your hand in less than 5 minutes using this oven mitt. Sure, that's not how normal people use these, but hey - you gotta compare them somehow. Since these have the highest potential for burning, I rate them the worst.

#8 Pink, Flowery, and Painful

Brand: Dollar Store
Value: $2.50

Though these are by far the prettiest, they're also quite deadly. If my hand had been in them for the experiment, I would have gotten a burn about 5:20 into the test. Not nice. It's also worth pointing out that throughout all tests, these oven mitts got the closest to 70 degrees (67.6 after 10 minutes). Yikes!

#7 Black Cuisinart

Brand: Cuisinart
Value: $15.95

Hilariously, I got these as a Christmas present from my parents before any of these birthday shenanigans went down. Sadly, they're also apparently the type of oven mitt that likes to burn your hand off. Their redeeming factor is that they only increased in temperature 1.2 degrees within the first 30 seconds of the test, which is more than enough time for most oven extractions. Would likely have burned my hand about 5:40 into the test.

#6 Green Silicone

Brand: Ming Wo (?)
Value: $9.99

Man, these silicone ones look so fancy, but really like burning your hands to crisps. This is very similar to #7, in that it has one of the lowest heat gradients at first, but by 7 minutes into the test would have made you very unhappy. I'm sure there's some materials science point to be made here, but that would involve actual science.

#5 Languages of Pain

Brand: Dollar Store
Value: $2.50

Awesomely enough, I got two pair of these for my birthday. These were pretty decent for a language lesson, but woulda burned your hands at about 7:20 into the test. An excellent example of Dollar Store quality oven mitts holding their own against their expensive counterparts though...

Here are pretty graphs of the worst five oven mitts:

Again, the pink and blue oven mitts both had high initial rates of heat pickup, and ended up with the highest heats (the ranking order is a bit different from the graph because I tested the pink one on a colder day. I know, terribly unscientific of me...). The silicone mitts did much better for the first two minutes, but then took on heat at a similar rate to everyone else. Tsk tsk.

The rest of the mitts happily didn't ever hit 55 degrees within their tests, so I'll rank them based on their total heat gain over the 10 minutes:

#4 The Ove' Glove

Brand: No clue
Value: $18.99

In a stunning come-from-behind near-podium finish, the Ove' Glove turns out to be a contender! And if you don't believe me, check out this totally awesome super cool consumer video (sarcasm). The Ove' Glove gained heat at an average rate of 2.95 degrees per minute - not shabby!

#3 The Alien

Brand: Dollar Store
Value: $2

Put this sucker on your hand and you've got great alien chestburster puppet! Alternatively, use it to take hot things out of an oven and not burn yourself. By far the best bang for the buck, somehow it combines the silicone and fabric and makes a decent oven mitt, gaining an average 2.82 degrees per minute.

#2 Better Barbeque

Brand: CTG Brands
Value: Weight in gold?

Wowza. This one is hefty, basically goes up to my elbow, and can hold its heat, only gaining an average of 2.66 degrees per minute. Very nice. These also won the contest for lowest heat pick-up in the first minute, and didn't even register a temperature change until 30 seconds into the test.

#1 President's Choice

Brand: PC
Value: 7 unicorn hairs?

These guys were the bomb, only gaining 2.61 degrees per minute. They're also flexible enough to use regularly, unlike the silicone ones.

Graph of the top 4 oven mitts:

Again, some very smooth curves here. The CTG oven mitt was by far the steadiest heat increase, but lost out to the PC mitt over the full length of the test. I know that my ranking has been more-or-less arbitrary this whole time, but I'm comfortable with declaring the CTG mitt to be my favorite (because really, who uses a mitt for 10 minutes at a time?).

Thanks again to everyone for pitching in on the oven mitt present. I hope I've used them in an appropriate manner!

Friday, January 10, 2014

Fun with Evolution

Darwin's theory of evolution suggests that random variation can lead to non-random change in species, where individual organisms that are marginally better suited to their environment have a better chance of surviving and passing their winning genes on to the next generation. From this extremely simple idea, repeated countless times, we get the massive diversity and complexity of life.

While it's pretty cool and poetic and stuff, what's particularly fascinating is how it's been adopted into computing science. An entire branch of problem-solving algorithms exist that attempt to recreate evolution, an when reproduced on a large scale can actually be effective at solving incredibly complex problems.

Evolutionary algorithms are particularly well suited to problems where the connections between variables aren't well understood, and computing shortcuts can't be taken, but where programmers know in general what they're looking for. They are also prone to several issues: they're slow, and they tend to latch onto solutions that are pretty good, but not the best (also known as local maximums).

In the simplest terms, computer scientists will create a "gene pool" full of randomly-determined organisms, with the random genes corresponding to variables to be optimized. If the problem is well-defined, each organism can be evaluated and ranked based on how good of a solution they are, and then (similar to real life), the fittest will get together and have baby organisms, with genes from both parents. The process can be repeated as long as desired until a suitable solution is found.

Like real evolution, the non-random selection of the fittest, with random variation in their genes and starting condition, is expected to eventually produce organisms that are well-suited to their environment, but in this case being well-suited means they are an optimized solution to a problem.

In the real world, designers have used evolutionary algorithms to optimize extremely complex designs like car engines or wind turbine blades, but for fun I decided to do a fairly simple test on Excel as a proof of concept to myself. Instead of doing anything useful or fancy, I decided to try to get Excel to draw for me. Specifically, I wanted it to say "HI" to me (hey, I yell at it enough, figured it deserved a chance to yell back).

In order to do this, I politely asked Excel to create 100 random organisms, with each 'organism' defined as a 24-'gene' series of random values. This resulted in 100 random assortments of 6 rectangles. These 'organisms' could look something like this:

And I wanted to give them the goal of overlapping to trace this:

In order to do that, each of my original 100 'organisms' were given a fitness value based on whether they covered the target areas, but lost points based on how much they covered non-target areas. Then they all got a chance to breed like rabbits, but the fittest ones (with the highest scores) had a better chance of breeding than the others.

The actual breeding process is indistinguishable from zookeepers breeding uncooperative endangered animals - I stuck two randomly (but not equitably) chosen organisms in a room and made them watch videos of other bits of data 'doing it' until something popped out. In Excel terms, though, each of the 24 genes were examined in turn, and each child gene had an even chance of coming from either parent. In order to freshen up the gene pool, each gene also had a 5% chance of mutating.

Breeding continued until I got 100 new little babies, at which point I killed off the old stock and started over. In each generation, the fittest have a disproportionate chance of passing on their genes to the next generation, with the idea being that hopefully each generation is stronger than the previous until the goal is met.

These are the results of the first test, after 1,000 generations:

To use technical terminology, this is extremely unimpressive. Two of the rectangles form the I on the right, the lower right leg of the H is there, but a bunch of dead space in the H is taken up on the top, and one rectangle has wandered off completely (there were supposed to be 6...). Don't even get me started on the green rectangle, I think he's shy.

What was encouraging about this, though, was that the average scores for the population did tend to grow each generation, though they ended up plateau-ing after about 500 generations:

This is what I meant before by 'local maximum' solutions. In order for the score to improve, the purple rectangle would have to change by an amount that's more than what I've allowed mutations alone to cover. Also, during the required changes, it's likely that the scores would decrease (as the purple rectangle abandons those upper H legs), which is resisted from an evolutionary point of view. This actually parallels evolution quite well in that once animals are adapted to a situation they stop changing rapidly (even though they may not be perfectly optimized), until external pressures force them to need to adapt again.

The best way to improve solutions from evolutionary algorithms is to increase the sample size (get more genes in there!) and number of generations, but since those both involved forcing my computer to make funny noises, I decided instead to try a brand new set of 100 random organisms. After 1000 generations, I got:

That's... sort of better, actually. At least all six rectangles made it onto the screen, and all three vertical lines are definitely there. Again, though, in order to get this one to perfectly spell the word, the little purple rectangle would have required a tremendously lucky mutation, which was discouraging enough that instead I decided to try one last new group of 100. Here's their final result:

Actually, that's not bad. I have no idea what the little purple guy (why's it always purple?) is doing, but he's pretty much out of the way, and all the major parts of the word "HI" are definitely covered. Not bad for random number on Excel, eh?

In case you really wanted to play around with this spreadsheet, I've put a version of it here. It's already been seeded with random numbers, but if you open it as a macro-enabled spreadsheet, every time you hit "Crtl+q", a new generation will form (Crtl+w for 10 new generations, Crtl+e for 100, but that'll take a bit of time...). Enjoy!

Friday, September 20, 2013

Bieber Fever

Newspapers had a hay-day last year following the publication of a paper out of the University of Ottawa that discussed Bieber Fever. Some articles included:

Science Confirms: Bieber Fever Is More Contagious Than the Measles (The Atlantic)

Bieber fever is worse than measles, according to actual science (National Post)

Bieber fever more contagious than measles, claim scientists (The Week)

'Bieber Fever' more infectious than measles (CBC)

Of all of these, I'm most disappointed in the CBC - I often pay attention to the CBC, and it's discouraging to know that they might be equally as wrong about other things as they are about this.

What's going on here? A professor from the University of Ottawa published a paper where a new model was developed to look at Bieber Fever, and the paper does indeed include the quote "It follows that Bieber Fever is extremely infectious, even more than measles, which is currently one of the most infectious diseases. Bieber Fever may therefore be the most infectious disease of our time." Oh my god, those newspapers must be right! Science has confirmed our worst fears! This must be backed by hard facts and empirical evidence!

Well, no. The paper appears to be a chapter in a book that examines diseases through various mathematical models. Each of the papers in the book (four of which are written by the Bieber Fever author) takes on a different disease and models it, then examines mathematically the effects of different approaches to the disease, like pulse vaccination or changes in infection or relapse rate. I can't comment on the quality of the other chapters in the book, but they seem to be well-developed and certainly based on real diseases.

The Bieber Fever paper is a little bit different though. First of all, it is clearly written in a tongue-in-cheek manner that I think flew over the heads of most major newspapers. The humor and sarcasm actually make it quite an entertaining read, and if the piece was written as a humorous look into a creative way of adapting a disease model (which is my suspicion), then it could certainly be a fun case study for biology or math students. It is definitely not something worth raising alarms over in newspapers, though, as the model's predictions aren't validated against any actual statistics and its math is misleading, allowing them to draw this ridiculous comparison to measles that grabbed newspaper attention.

Mathematical disease modelling is a pretty cool field. The most basic model that can be developed is an SIR model - a population is divided up into three groups (Susceptible, Infected, and Removed), and people move through the groups depending on disease parameters and the size of the groups at a given time. For instance, if a lot of people are Infected, the chance of a healthy Susceptible person getting infected is quite high (perhaps due to lots of people sneezing on them), but as more people are Removed (happily by recovery and immunity, or sadly by death), it may become harder for the disease to propagate.

In this model, βIS represents the rate that healthy people become sick - effectively, it is the chance that in a given time a Susceptible person will encounter an Infected person, multiplied by the chance that that encounter will transmit the disease. On the other end, γI represents the rate at which sick people become healthy, effectively the number of Infected people divided by how long it takes them to get healthy (or die, I suppose).

As long as the rate of people becoming sick (βIS) is larger than the rate people are recovering (γI), then the disease will reach an epidemic of some type - otherwise it will quickly die out. For simple models, the ratio of these rates is known as the Basic Reproduction Number (R₀) of a disease, and correlates to the number of new diseases a sick person will cause. This is pretty easy to visualize - if the ratio R₀ is bigger than 1, then by the time someone recovers from their illness they’ll have spread it to at least one more person, and the disease will grow. If you're unlikely to make someone else sick when you fall ill, the disease’s R₀ will be less than 1, and the disease will go away without much of an outbreak.

For reference, the flu typically has an R₀ of 2-3, HIV is around 2-5, Smallpox is 5-7, and Measles is 12-18. For every person who got Measles, the disease was so infectious and you had it for long enough that you were expected to transmit it to between twelve and eighteen people before you either recover or die.

Frightening stuff. Fortunately, analyses of diseases with these mathematical models shows that as long as a certain proportion of a population is immunized by vaccine, epidemics can be avoided. That proportion needed is (1-1/R₀) - so a typical flu needs 60% immunization to prevent outbreak, and measles needs over 90%. If you're still unsure about getting a flu shot, just remember that if a population doesn't hit ~60% immunity, it is very much worse off for those who don't have the vaccine or who are otherwise susceptible.

The Bieber paper develops a more complicated mathematical disease model. It looks something like this:

The author, Robert Smith? (not a typo), proposed a model where media effects have a large impact on the disease. Positive media (P in the picture) can increase the rate at which healthy people become Bieber-infected, and can also make recovered individuals susceptible to re-infection, and Negative media can heal the sick or immunize the susceptible (how miraculous).

Using the numbers that Smith? has in his paper, the spread of Bieber Fever in a typical school of 1,500 students would look something like this:

After about 2 months, the system reaches an equilibrium with about 85% of people being Bieber Fanatics. The paper makes a couple of assumptions: first of all, people are assumed to "grow out" of Bieber Fever after a period of two years. People are also expected to interact with everyone else in the population at least once a month, and have a transmission rate of 1/1500. This means that the average infected person will infect 1 person a month for 24 months, giving Bieber Fever an R₀ of 24.

SWEET MOTHER OF GOD IT'S WORSE THAN MEASLES!?!?

Not even a little bit! The transmission rate is absolutely just assumed out of nowhere - no stats, evidence, or explanation given. Similarly, the length of the disease is made up, with the explanation "But let’s be honest, we all know which one it really is, don’t we?" (Smith?, p. 7). Essentially, the authors were given a calculation where they had to assume three numbers and multiply them together, and newspapers are surprised that the answer to the multiplication was high. Even the mechanics of the positive and negative media effects are questionable, though the model they developed could help provide insight into other diseases with relapse mechanisms.

The paper is cute, clever, and provides a mathematical analysis of a convoluted set of differential equations - for all of these things it serves a nice purpose as a tongue-in-cheek entry into a textbook examining mathematical modelling of infectious diseases. But newspapers taking essentially the result of an unfounded set of assumptions out of proportion and reporting them as "Science Confirms!" will always annoy me to no end.

One last thing. This is what a graph would look like if the same school was hit with measles:

Now that's an epidemic - three people sick can infect up to 1,200 in less than a week. Remember this when deciding whether or not to immunize your baby.

Friday, August 30, 2013

Extreme Rock, Paper, Scissors

Few games have captured the hearts and minds of the public as Rock, Paper, Scissors.

Ok, that's maybe not even remotely true. RPS is one of the most basic games on earth that, barring any possible psychological factors, is essentially a glorified coin toss. That hasn't stopped fantastic organizations like the World RPS Society from existing (their 'responsibility code' includes a recommendation not to use Rock Paper Scissors for life-threatening decisions. Good call.), and some people certainly take the game very seriously.

I'm also going to take it very seriously, albeit in a completely different direction. I came across a post on my new favorite blog last week, DataGenetics, where the author created and examined a sort of iterative 'team-based' game of RPS. In his game, people in a group would be assigned to always play a certain play, and then are drawn into random pairings for a show-down. The winners go back into the pool, and the losers get eliminated, until only one 'team' of players remains.

This becomes an interesting examination that, in a way, can sort of approximate population dynamics. If we had an island where wild rocks, papers, and scissors ran around in their natural habitat, where the natural prey of rocks were scissors (they'd viciously bend the blades before eating them), the natural prey of scissors were paper (the blades would tear into the paper velociraptor-style) , and the natural prey of paper was rocks (who would get... covered? By far the lamest of the RPS triad), then if the populations were fairly equal the island would stay at a fairly steady equilibrium. As soon as you remove, say, scissors, the rock population would be catastrophically destroyed by all the unchecked paper roaming around... covering them.

Alright, so it isn't a perfect analogy, but I hope you get the point. As this is a non-transitive food cycle (instead of a mostly transitive food chain), the dynamics are a little bit different than what you might expect in real life, but population dynamics in simple predator-prey models really are rather fascinating.

DataGenetics' results were really cool, though, so I decided to see if I could reproduce them and take them further. The first model he made featured 10 rock players, 10 paper players, and a varying number of scissors players. What do you think would happen as the population of starting scissors players decreases? Surprisingly, their odds of winning actually increase. In fact, as long as there is at least one scissors player, their odds of winning the whole thing range from 33% to 60%, with a peak when the starting scenario is 10 rock, 10 paper, and 4 scissors.

My results came from running 10,000 game simulations per data point, and almost perfectly match up with the results from DataGenetics, so I'm reasonably confident in them.

What's going on here is actually really cool. Reducing the number of scissors at the outset means that initial pairings between players are going to more likely be between rocks and paper - which paper will (illogically by laying on the rock) win. As the game progresses, it is more likely to become mostly paper and scissors, which is an easy scenario for the scissors to win. This means that unless scissors get unlucky and have a lot of early pairings against rocks, they have much better than even odds of winning the game outright. As DataGenetics put it, this is a great example of the expression "The enemy of my enemy is my friend."

Here's a cool alternative view:

If you run a similar scenario, but with 50 rocks, 50 papers, and a variable number of scissors, the results are even more extreme - scissors' best chance of winning is when they start with 17 players against their opponents' 50 each, where they have a 91% chance of winning the whole game.

The reversal of the trend between rocks and scissors here is also pretty fascinating. At larger numbers of initial players, paper's odds of winning are much more sensitive to changes in the number of scissors than rock's, until the number of scissors becomes drastically low. It's also worth pointing out that my results here deviate a bit form DataGenetics after 25 scissors players, even though I was trying to do fundamentally the same thing as he was. I have no idea who's correct, so I guess it's time for a nerd show-down...

I said I was going to take this a step further, and the nerdiest way of taking Rock, Paper, Scissors further is to change it to Rock, Paper, Scissors, Lizard, Spock.

Good thing the rules to normal RPS are easy, because adding two new options adds seven new combinations to remember. As the Big Bang Theory puts it:

Scissors cut paper,

Paper covers rock,

Rock crushes lizard,

Lizard poisons Spock,

Spock smashes scissors,

Scissors decapitates lizard,

Lizard eats paper,

Paper disproves Spock,

Spock vaporizes rock, and (as it always has)

Rock breaks scissors

So what if we ran the same game experiment, but with 10 each of the five combinations? If we kept scissors as the changing variable to be consistent, we get the following:

Now this is just cool. Instead of scissors getting a bonus by losing players from the start, scissors are barely affected at all until their numbers get small enough. While the starting number of scissors is between 4-10, they're still doing about average with 20% chance of winning, after which their chances plummet.

What's far more fascinating is the other players - poor Spock gets mutilated! Spock's chances of winning are always less than scissors' until scissors has 0 starting players. Why?

Most likely it's because, even though Spock smashes scissors, scissors is the only player that can kill both of Spock's predators (lizard and paper). As the chances of both of these getting killed decrease, so too does Spock's chances of winning the game. Meanwhile, lizard is having a great time. Its one big predator, scissors, is dwindling in numbers, meaning it more likely will have to face paper or Spock, which it's of course fine with.

If we bump up the number of starting players to 50 again, we see the true dominance of rock, and again scissors tends to suffer very little (in fact, they're second-most likely to win up until they start off with only 20).

Very cool indeed, especially if you're rock. Again, poor Spock gets decimated, but in general the behaviors of the other players are largely similar to the previous example, but at a much reduced scale to give way to rock.

In reality there's virtually no practical application to any of this, except to perhaps point out the unanticipated consequences that may arise when you remove an element of a balanced ecosystem. Population dynamics in the wild certainly don't follow such simple rules, but it's definitely not unheard-of for the addition or removal of a small part of the population of a species to have massive ramifications on other species, and the fact that this can be modeled with math and Rock, Paper, Scissors is pretty cool.

Tuesday, July 30, 2013

In Defense of Fluoride

Recently, the #yegvote twitter feed has been occupied with a lot of discussion about water fluoridation, particularly from and against mayoral candidate Curtis Penner (including him tweeting a bibliography of 70 consecutive journal articles early Thursday morning). As has been pointed out, a lot of the twitter debate has devolved into personal attacks on both sides, but the question of water fluoridation is important and worth discussing on its merits alone.

First of all, some fun fluoridation facts!

low concentrations of fluoride in your mouth reduces the rate at which your enamel breaks down,
fluoride is often naturally present in water all over the world in different concentrations,
about 5% of the world's population has fluoride added to the water supply at low concentrations (including Edmonton), and
different studies have shown that the presence of fluoride in water can reduce cavities by between 27 and 40% relative to regular brushing

The ideal concentration for fluoride in water appears to be somewhere between 0.5-1.0 milligrams per liter. This is lower than the natural levels found in lots of communities, and in many parts of the world fluoride concentrations are reduced or even eliminated before being pumped into municipal water supplies.

Mr. Penner's platform for mayor includes a lengthy paragraph against water fluoridation, with the first argument referencing the Material Safety Data Sheet for the chemical that's being used in Edmonton, hydrofluosilicic acid. His claim appears to be that since this chemical is listed as corrosive and dangerously reactive, we shouldn't have anything to do with it. This is at best a red herring, and not an argument at all - at the high concentrations that the chemical is stored, very nearly anything is poisonous. We add chlorine to our water supply explicitly to kill living organisms, and its MSDS warnings are even more severe than fluoride. This isn't to say that the concentrated version of the chemical is ok to drink, but the final product in our taps is millions of times less concentrated than the solution the data sheet refers to.

Dropping evil-sounding chemical names and referring to alarming MSDS data sheets can't form an argument in and ofitself. People regularly consume citric acid and acetic acid, whose MSDS data sheets name them as flammable and corrosive and include pages of toxicity warnings, but we enjoy them as orange juice and vinegar. The fact of the matter is that data sheets have to cover all possibilities and naturally make anything sound evil - even the data sheet for plain old boring water has lethal dosage information.

Mr. Penner then goes on to reference this Harvard meta-analysis on the effects of fluoride on children. His claim is that the study indicates that "children who do not drink fluoride have a 20% better chance of having high intelligence, whereas those who do drink fluoride have a 9% better chance of developing mental retardation." Oddly enough, the words "mental retardation" and the term "20%" don't show up in the journal article at all. In fact, they write instead that their "results support the possibility of adverse effects of fluoride exposures on children's neurodevelopment." Their major finding, actually, is that children who had "high exposure" to fluoride had an IQ that was0.45 points lower than reference children.

What constituted "high exposure"? Turns out it ranged from about 3-12 mg/L for most studies. Their reference points - the points the study considered not exposed to fluoride - were between 0.34-2.35 mg/L. In fact, some of the reference healthy populations were drinking water that was two to three timesmore fluoridated than what would ever be allowed in Edmonton's water. An overwhelming majority of the over 70 studies tweeted by Mr. Penner that supposedly support his position deal with high concentrations of naturally-occurring fluoride in India or China, not low concentrations carefully monitored in Canada. So while it most likely is true that high concentrations of fluoride can cause adverse effects, it is far more likely that increasing your intake of any substance by a factor of 5 to 10 over what scientists recommend would be similarly poisonous.

It has also been pointed out that Calgary stopped water fluoridation in 2011. This is true - and already has dentists raising alarms about increases in cavities (and don't forget, dentists get less work if people have fewer cavities. They must be really concerned...).

With studies showing the positive effects of low concentrations of fluoride, and other studies showing adverse effects only at levels significantly higher than Edmonton's, the scientific argument for those opposing the current fluoride program doesn't seem that strong. The remaining argument is one of policy - is it morally acceptable to add a substance to the water with the goal of treating an entire population?

While this is mostly a matter of personal opinion, it is interesting to consider the city’s role and responsibilities in providing water to the public. There is a very defined cost vs. benefit analysis to be made when treating water. No treatment and you'll kill people, add some filters and you'll kill a bunch less - but after a certain point there are massively diminishing returns with regards to how expensive it will be to make water just a little bit safer to drink, with the extreme end being an economically devastating plan to provide everyone with pure distilled water (which, actually, maybe isn't all that ideal after all).

If the factors behind water treatment were only economical in nature, then the benefits of saving families hundreds of dollars a year for dental work far outweigh the cost of fluoride at less thana dollar per person. Fluoridation provides a blanket benefit to everyone who drinks tap water, providing unique aid to those who cannot afford regular dental work or perhaps even other sources of fluoride like toothpaste.

Water fluoridation has been included with vaccinations and family planning as among the ten greatest public health achievements by the Centers for Disease Control and Prevention, and is supported by major health and dental organizations. As mentioned before, the up to 40% decrease in cavities from fluoridated water is in addition to any benefit already achieved from brushing your teeth. Any policy that so easily and effectively provides this much assistance against such a preventable condition as cavities is surely worth keeping around.

As long as the fluoride program that Edmonton currently uses continues to be beneficial, and the low concentrations that are used continue to not be dangerous, the program should not be stopped.

Monday, June 10, 2013

Traveling Drunkenman Problem

The Traveling Salesman Problem is a classic in computing. Given a list of cities and distances between each city, what is the shortest path that lets a salesperson visit each city exactly once and return to the starting point?

It's a bit of a tough problem at high numbers of cities, due to the exponentially increasing number of paths one can take each time a new city has been added. Because of that, the problem is useful in testing optimization processes that don't rely on simply checking all possible combinations by brute force.

What's more fun is to adapt it in such a way as to optimize, say, your drinking habits. For instance, what's the shortest path to visit the 10 best pubs in Edmonton and return to where you started?

In order to be somewhat objective, I used the Yelp list of the 10 highest rated pubs. Of course, when you're on a mathematically optimized pubcrawl, it's important to not drink and drive - so distances are converted into dollar figures following the City of Edmonton bylaws for taxis.

2 pubs: $19.40 round trip

If we start with the #1 and #2 pubs on the list, it would cost just under $20 to visit them both by taxi. No computer necessary - there's really only one way to get there and back (unless the taxi driver takes you on an expensive sight-seeing tour).

3 pubs: $30.40-33.40 round trip

There are 6 different ways to do this trip once you'd added #3 on the list, but it's not at all difficult to check out all the combinations for yourself and plan it without using a computer. Normally there wouldn't be any variation in the price (as you'd just be traveling along the same three sides of a triangle), but in the case of downtown Edmonton with all its one-way streets, some ways to drive between places end up being slightly more expensive than others.

4: $35.40-39.40

Things get a wee bit more fun here - 24 combinations to deal with, with 12 different paths between pubs to take into account.

5: $39.60-43.60

120 combinations, 20 paths between pubs. Still not too much variation between the paths yet.

6: $42.60-58.80

720 combinations, 30 paths between them. At this point there are relatively significant savings to be had by trying to optimize, but 720 combinations would be an awful lot to check by hand. The major reason why there is suddenly so much variation between different paths is that the 6th pub on the Yelp list is very close to the first. Paths that take advantage of this do well, and paths that ignore it end up costing a lot more than necessary.

7: $46.00-63.80

5,040 combinations. Things are getting really exciting now.

8: $49.40-72.60

40,320 combinations formed from 56 different possible paths between the 8 pubs.

9: $53.00-82.40

362,880 combinations. Let's just say that's an awful lot of things to check by hand (also messes with Excel a wee bit...).

10: $57.00-98.40

3,628,800 combinations. Again, quite a bit of a burden on poor Excel. But what's cool is that by optimizing your path between 10 of Edmonton's most popular pubs, you can save more than $40 in taxi fees.

This optimized pubcrawl through these pubs would look something like this:

What became readily obvious was the sheer amount of computing power necessary to check all of these. I have Excel files that are over half a gigabyte large in order to calculate the 4 million different combinations. Solutions to larger sized problems typically require other cool algorithms that I didn't feel like actually studying to learn.

Enjoy! Hope to see you out on the nerdiest tour of Edmonton's finest pubs next weekend!

Special thanks to Finbarr Timbers and Elizabeth Croteau for the idea, and Daniel Johns for helping me out of a tricky spot with VBA.

Wednesday, May 22, 2013

Les Poissons, Les Poissons

(hee hee hee, haw haw haw)

In WWII, the British were getting regularly bombed by V-1 and V-2 rockets. After a while, it turned out that certain neighborhoods of London were getting pelted a ton, while others weren't. Some people started to suspect that the Germans had exceptionally accurate bombing capabilities, and were avoiding neighborhoods where their spies lived.

In order to figure out what was going on, British statistician R. D. Clarke broke the city into a grid and counted the number of hits per area. His results almost perfectly matched what you'd expect from random chance, following what is known as the Poisson Distribution.

The Poisson Distribution is really good for figuring out the likelihood of a given number of events occurring over a fixed time or space, based only on an expected number of the same events. Clarke took the number of rocket hits and divided it by his number of grids produced, which suggested an expected average approximately 0.933 rockets per grid space. The Poisson Distribution for this suggests about 40% of London wouldn't ever actually be bombed, while almost 25% of London would get bombed two or more times, and this is exactly what happened. It turned out that the Nazis weren't accurate at all, and it was just random chance that determined who would get hit on any particular day.

Now a happier topic: happy birthday!

On the off-chance that it actually is your birthday and you're reading this, then this is quite the fortuitous event. Lucky me! Go have some cake instead of reading my blog, sillypants.

Birthdays are spread out over a long period of time, which makes them cool for demonstrating several fun statistics concepts. One of them is the always-fun birthday paradox, which says that it only takes a random group 23 people before you'd expect at least a 50% chance of having a shared birthday between any two of them. This number probably seems absurdly low, but it's completely true. In fact, this is one of the many fun party games you can try that involved math!

If you're a little bit like me, sometimes you like to wish your friends happy birthday, and facebook is a handy tool for reminding you to do that. If you're a lot like me, you might notice that there are extreme variations in the number of friends who have a birthday on any given day.

Obviously if you have fewer than 365 Facebook friends you're guaranteed to have several days where nobody has birthdays. Once you get over 365 friends, though, you might expect that your friends' birthdays would even out, and that eventually you might run out of days with no birthdays.

But as you may have noticed, some days tend to have way more birthdays than others. For instance, 7 of my friends have birthdays on January 28th, and 6 of my friends have birthdays on January 5th. Does this mean that a bunch of my friends' parents got frisky in April?

Turns out... probably not at all. I have 546 Facebook friends who have their birthdays listed, meaning that on average I should expect 1.496 birthdays per day. Intuition would suggest I should get mostly 1-2 birthdays per day, with the occasional day with 0 and sometimes 3. Can the distribution of number of birthdays be predicted, though?

Yes! This sort of problem appears absolutely perfect for the Poisson Distribution. In fact, when I went through and jotted down the number of birthdays on each day of the week, this is what I got:

It's impressive how close the two are. Also, maybe counter to what you'd expect, a full 20% of the year has no birthdays. My Facebook friends fit the Poisson Distribution with a Coefficient of Determination of 0.984.

In reality, the fact that there are days where 6 or 7 of my friends have birthdays isn't exceptional and rare, but expected - it would actually be weird if there weren't any. In order to expect no days with 0 birthdays, you would have to have over 2,150 friends - expecting an average of almost 6 birthdays per day.

So that's your fun statistics thought of the day - whenever you have a large enough random sample, what normally might seem like an outlier actually tends to confirm just how random it really is.

Thursday, May 16, 2013

Homeopathy: Worse than "Just Water"

In 1796, German physician Samuel Hahnemann first proposed the idea of homeopathy. Based on the concept that "like cures like", homeopathic remedies attempt to cure symptoms suffered by patients by using highly diluted concentrations of a substance that would normally cause the same symptoms. However, these remedies are ineffective, wasteful, and can be damaging if they stop people from seeking real medical attention.

At first glance, the principle behind homeopathy may not seem that far-fetched. For instance, some vaccines are basically just preparations made from a non-life-threatening version of a disease in order to prepare your body to fight off that disease, and vaccines are great. Homeopathy takes this concept – fighting an effect with its own cause - way too far though, well past the point of ridicule.

An actual example from the Canadian Society of Homeopaths is the use of onion juice as a remedy for hay fever. Onions cause runny noses and itchy eyes, and hay fever causes runny noses and itchy eyes. Homeopathy claims that since both of these cause the same symptoms, onion juice can also cure the symptoms of hay fever.

Let me reiterate: homeopathy claims that using more of something that causes your problems will end up curing your problems. They literally claim that two wrongs make a right. Similarly, they claim that oysters can cure indigestion, arsenic stops diarrhea, and mercury can cure chronic pain.

If homeopathy doesn't already seem ridiculous, it's about to. It's quite obviously not a good idea to ingest arsenic and mercury, and homeopathic remedies certainly wouldn't sell if they immediately killed the people who bought them. This is where the second major claim of homeopathy comes into play: the more a remedy is diluted, the more potent its healing powers will be.

While having the obvious advantage of avoiding killing people by directly poisoning them, the dilutions used in most homeopathic remedies don't help the plausibility of homeopathy as a practice. Homeopathic remedies are commonly prepared by performing a series of dilutions by a factor of 100, where the number of dilutions is referred to as the C number of the remedy. For example, they could take one millilitre of an ingredient, add it to 99 millilitres of water and shake it, then take one millilitre of that and add it to a new 99 millilitres of water, and get a 2C solution. The original number of dilutions proposed by Hahnemann was 30C - a series of thirty 1% dilutions. This is an equivalent ratio of one part active ingredient in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (1 novemdicillion) parts water.

For reference, a 6C homeopathic solution of salt water is supposed to help with irritability, but that's 40 billion times less concentrated than the salt in the ocean. The amount of liquid that's passed through an average 80 year old by the time they die dissolved in the combined water of all the world's lakes, oceans, and rivers is around 8C. What's intriguing is that if you started with a solution containing on mole of original material, at a dilution of 12C there's only a 60% chance of finding a single molecule of the active ingredient, and at a dilution of 13C, you're looking at pure water. A common over-the-counter homeopathic remedy at 30C isn't just essentially water - it is water.

The most popular homeopathic remedy sold is Ocsilloccinum, a 200C dilution of duck liver that is supposed to help with the flu. There are approximately 10⁸⁰ atoms in the universe, so in order to have enough water to get one molecule of duck liver extract in a solution at this dilution, one would need 10³²⁰ universes worth of water. Somehow this is still considered a very potent concentration. Oscillococcinum is labelled as consisting of 0.85 g sucrose and 0.15 g lactose – these are 100% sugar pills.

Homeopathic practitioners can't easily argue with math, and most will readily admit that there is no active ingredient present in the remedies that they sell to patients. Instead, they claim that water has a "memory" and that the dilution and shaking process involved in preparing a homeopathic solution leaves an imprint on the water, thus changing its properties.

There is absolutely no evidence for this. Logically, it also doesn't hold up. As has been pointed out by countless comedians and scientists, it doesn't make any sense that water could remember the tiny amounts of poison once dissolved in it, but forget all the sewage it's come in contact with. Both the ability for water to retain an imprint of a chemical, and its ability to selectively forget other chemicals, would violate basic fundamental laws of physics.

So homeopathy is based on a principle that doesn't make sense, at concentrations that are nonexistent. Yet it is still around. In the United Kingdom, the Royal Homeopathic Hospital has the Queen as their official patron, and homeopathic remedies like Oscillococcinum in France are one of the most common treatments for the flu. Why?

One leading theory is that any perceived benefits of homeopathic water are likely due to a mix of confirmation bias and the placebo effect. Studies have shown that just the act of going to a clean hospital-like environment, listening to a compassionate doctor (or someone perceived to be a doctor), and taking something that you believe to be a medicine can often help with certain conditions. Placebos can have a measurable effect on improving health in certain circumstances, and are well-enough studied that we know a fake needle is more effective than a fake capsule, which is in turn more effective than a fake pill.

Confirmation bias likely also plays an important role in people's opinions on homeopathy. As an example, homeopaths describe a process known as 'homeopathic aggravation' - a temporary worsening of symptoms following a dose of a homeopathic remedy. When a patient takes the dose and then starts to feel worse, the homeopath can claim that's all part of the plan, reassuring the patient and convincing them that the remedy is working. Of course, an alternative explanation for homeopathic aggravation is self-evident - a patient notices they have a runny nose, takes some non-medicine, and then the rest of the flu hits and they claim that's the 'aggravation'. Well, no - that's just how diseases work when they aren't subject to actual medicine.

After hearing these arguments, a homeopath may ask to agree to disagree and claim that since the remedies are so dilute they can't possibly have side effects, but they bring comfort and maybe benefits to the people who buy them, surely there’s no harm in offering it as an alternative medicine.

The harm comes when people spend money on homeopathic remedies, mistaking them for real medicine, instead of going to real doctors. The placebo effect can have noticeable effects on health, but isn't generally capable of curing cancer. Patients with serious medical conditions who forego proven treatments from real doctors put themselves at severe risks they otherwise wouldn't need to experience, much like those who choose to avoid vaccines or sunscreen.

Homeopathy sold alongside real medicine, courtesy of your local Safeway.

Homeopathic remedies do not need to pass the same testing requirements as real medicine to be sold in Canada, but they can still be sold deceivingly on a shelf in a grocery store beside real medicine as though they are equally valid options. Someone desperate for symptom relief who doesn't know any better could easily mistakenly be buying sugar pills and pure water, costing them money and delaying real treatment. This is where the real harm of homeopathy comes from.

Drug regulations require strict testing for a reason, and if homeopathy cannot provide evidence that it works or at least a plausible working mechanism then it should not be portrayed and sold commercially as an equally viable form of treatment.

Extreme Enginerding

Labels