The Traveling Salesman Problem is a classic in computing. Given a list of cities and distances between each city, what is the shortest path that lets a salesperson visit each city exactly once and return to the starting point?
It's a bit of a tough problem at high numbers of cities, due to the exponentially increasing number of paths one can take each time a new city has been added. Because of that, the problem is useful in testing optimization processes that don't rely on simply checking all possible combinations by brute force.
What's more fun is to adapt it in such a way as to optimize, say, your drinking habits. For instance, what's the shortest path to visit the 10 best pubs in Edmonton and return to where you started?
In order to be somewhat objective, I used the Yelp list of the 10 highest rated pubs. Of course, when you're on a mathematically optimized pubcrawl, it's important to not drink and drive - so distances are converted into dollar figures following the City of Edmonton bylaws for taxis.
2 pubs: $19.40 round trip
If we start with the #1 and #2 pubs on the list, it would cost just under $20 to visit them both by taxi. No computer necessary - there's really only one way to get there and back (unless the taxi driver takes you on an expensive sight-seeing tour).
3 pubs: $30.40-33.40 round trip
There are 6 different ways to do this trip once you'd added #3 on the list, but it's not at all difficult to check out all the combinations for yourself and plan it without using a computer. Normally there wouldn't be any variation in the price (as you'd just be traveling along the same three sides of a triangle), but in the case of downtown Edmonton with all its one-way streets, some ways to drive between places end up being slightly more expensive than others.
4: $35.40-39.40
Things get a wee bit more fun here - 24 combinations to deal with, with 12 different paths between pubs to take into account.
5: $39.60-43.60
120 combinations, 20 paths between pubs. Still not too much variation between the paths yet.
6: $42.60-58.80
720 combinations, 30 paths between them. At this point there are relatively significant savings to be had by trying to optimize, but 720 combinations would be an awful lot to check by hand. The major reason why there is suddenly so much variation between different paths is that the 6th pub on the Yelp list is very close to the first. Paths that take advantage of this do well, and paths that ignore it end up costing a lot more than necessary.
7: $46.00-63.80
5,040 combinations. Things are getting really exciting now.
8: $49.40-72.60
40,320 combinations formed from 56 different possible paths between the 8 pubs.
9: $53.00-82.40
362,880 combinations. Let's just say that's an awful lot of things to check by hand (also messes with Excel a wee bit...).
10: $57.00-98.40
3,628,800 combinations. Again, quite a bit of a burden on poor Excel. But what's cool is that by optimizing your path between 10 of Edmonton's most popular pubs, you can save more than $40 in taxi fees.
This optimized pubcrawl through these pubs would look something like this:
What became readily obvious was the sheer amount of computing power necessary to check all of these. I have Excel files that are over half a gigabyte large in order to calculate the 4 million different combinations. Solutions to larger sized problems typically require other cool algorithms that I didn't feel like actually studying to learn.
Enjoy! Hope to see you out on the nerdiest tour of Edmonton's finest pubs next weekend!
Special thanks to Finbarr Timbers and Elizabeth Croteau for the idea, and Daniel Johns for helping me out of a tricky spot with VBA.
Extreme Enginerding
Science, politics, and general geekiness, oh my!
Monday, June 10, 2013
Sunday, June 2, 2013
Beer Math
Over the last week I had the privilege of going both to MKT and Underground Tap and Grill
in Edmonton. I had a very pleasant experience at both. They are both known
for having large beer menus, so I took the opportunity to try out a
couple of new beers.
Of course, one of the problems with such a large selection on tap is that it's easy to get overwhelmed by the sheer number of options. Underground, for instance, has 72 beers on tap - how are you supposed to pick a new one worth trying? A beer connoisseur would probably read the descriptions and ask pointed questions, but an extreme enginerd could just use math.
Yay!
Take a look at this menu. Each beer is offered at a different combination of volume, alcohol content, and cost.
Based on this, it's actually pretty straightforward to find which beer has the most alcohol per volume at the lowest cost. In fact, it would look something like this:
First of all, that's quite the variation! You could get almost three times as drunk off of the Salmon Fly Honey Rye as the Pilsner for the same cost. My goodness - I would never advocate for such a thing.
Of course, there are many factors that influence the cost of alcohol. One of them is the mandatory minimum alcohol prices legislated by the Government of Alberta. Their rules work out to something like this:
And as a quick aside before we finish, according to the Alberta rules, wine is the cheapest way to get drunk at $2.92 per ounce of alcohol. At this rate, it would only take $28.26 worth of wine to kill me by alcohol poisoning. How cost effective!
Of course, one of the problems with such a large selection on tap is that it's easy to get overwhelmed by the sheer number of options. Underground, for instance, has 72 beers on tap - how are you supposed to pick a new one worth trying? A beer connoisseur would probably read the descriptions and ask pointed questions, but an extreme enginerd could just use math.
Yay!
Take a look at this menu. Each beer is offered at a different combination of volume, alcohol content, and cost.
Based on this, it's actually pretty straightforward to find which beer has the most alcohol per volume at the lowest cost. In fact, it would look something like this:
First of all, that's quite the variation! You could get almost three times as drunk off of the Salmon Fly Honey Rye as the Pilsner for the same cost. My goodness - I would never advocate for such a thing.
Of course, there are many factors that influence the cost of alcohol. One of them is the mandatory minimum alcohol prices legislated by the Government of Alberta. Their rules work out to something like this:
- Spirits/Liqueurs: $2.75/oz.
- Wine (by the glass): $0.35/oz.
- Draft Beer: $0.16/oz.
- Beer/Cider/Cooler (bottle or can): $2.75 each
And as a quick aside before we finish, according to the Alberta rules, wine is the cheapest way to get drunk at $2.92 per ounce of alcohol. At this rate, it would only take $28.26 worth of wine to kill me by alcohol poisoning. How cost effective!
Labels:
Statistics
Wednesday, May 22, 2013
Les Poissons, Les Poissons
(hee hee hee, haw haw haw)
In WWII, the British were getting regularly bombed by V-1 and V-2 rockets. After a while, it turned out that certain neighborhoods of London were getting pelted a ton, while others weren't. Some people started to suspect that the Germans had exceptionally accurate bombing capabilities, and were avoiding neighborhoods where their spies lived.
In order to figure out what was going on, British statistician R. D. Clarke broke the city into a grid and counted the number of hits per area. His results almost perfectly matched what you'd expect from random chance, following what is known as the Poisson Distribution.
The Poisson Distribution is really good for figuring out the likelihood of a given number of events occurring over a fixed time or space, based only on an expected number of the same events. Clarke took the number of rocket hits and divided it by his number of grids produced, which suggested an expected average approximately 0.933 rockets per grid space. The Poisson Distribution for this suggests about 40% of London wouldn't ever actually be bombed, while almost 25% of London would get bombed two or more times, and this is exactly what happened. It turned out that the Nazis weren't accurate at all, and it was just random chance that determined who would get hit on any particular day.
Now a happier topic: happy birthday!
On the off-chance that it actually is your birthday and you're reading this, then this is quite the fortuitous event. Lucky me! Go have some cake instead of reading my blog, sillypants.
Birthdays are spread out over a long period of time, which makes them cool for demonstrating several fun statistics concepts. One of them is the always-fun birthday paradox, which says that it only takes a random group 23 people before you'd expect at least a 50% chance of having a shared birthday between any two of them. This number probably seems absurdly low, but it's completely true. In fact, this is one of the many fun party games you can try that involved math!
If you're a little bit like me, sometimes you like to wish your friends happy birthday, and facebook is a handy tool for reminding you to do that. If you're a lot like me, you might notice that there are extreme variations in the number of friends who have a birthday on any given day.
Obviously if you have fewer than 365 Facebook friends you're guaranteed to have several days where nobody has birthdays. Once you get over 365 friends, though, you might expect that your friends' birthdays would even out, and that eventually you might run out of days with no birthdays.
But as you may have noticed, some days tend to have way more birthdays than others. For instance, 7 of my friends have birthdays on January 28th, and 6 of my friends have birthdays on January 5th. Does this mean that a bunch of my friends' parents got frisky in April?
Turns out... probably not at all. I have 546 Facebook friends who have their birthdays listed, meaning that on average I should expect 1.496 birthdays per day. Intuition would suggest I should get mostly 1-2 birthdays per day, with the occasional day with 0 and sometimes 3. Can the distribution of number of birthdays be predicted, though?
Yes! This sort of problem appears absolutely perfect for the Poisson Distribution. In fact, when I went through and jotted down the number of birthdays on each day of the week, this is what I got:
It's impressive how close the two are. Also, maybe counter to what you'd expect, a full 20% of the year has no birthdays. My Facebook friends fit the Poisson Distribution with a Coefficient of Determination of 0.984.
In reality, the fact that there are days where 6 or 7 of my friends have birthdays isn't exceptional and rare, but expected - it would actually be weird if there weren't any. In order to expect no days with 0 birthdays, you would have to have over 2,150 friends - expecting an average of almost 6 birthdays per day.
So that's your fun statistics thought of the day - whenever you have a large enough random sample, what normally might seem like an outlier actually tends to confirm just how random it really is.
In WWII, the British were getting regularly bombed by V-1 and V-2 rockets. After a while, it turned out that certain neighborhoods of London were getting pelted a ton, while others weren't. Some people started to suspect that the Germans had exceptionally accurate bombing capabilities, and were avoiding neighborhoods where their spies lived.
In order to figure out what was going on, British statistician R. D. Clarke broke the city into a grid and counted the number of hits per area. His results almost perfectly matched what you'd expect from random chance, following what is known as the Poisson Distribution.
The Poisson Distribution is really good for figuring out the likelihood of a given number of events occurring over a fixed time or space, based only on an expected number of the same events. Clarke took the number of rocket hits and divided it by his number of grids produced, which suggested an expected average approximately 0.933 rockets per grid space. The Poisson Distribution for this suggests about 40% of London wouldn't ever actually be bombed, while almost 25% of London would get bombed two or more times, and this is exactly what happened. It turned out that the Nazis weren't accurate at all, and it was just random chance that determined who would get hit on any particular day.
Now a happier topic: happy birthday!
On the off-chance that it actually is your birthday and you're reading this, then this is quite the fortuitous event. Lucky me! Go have some cake instead of reading my blog, sillypants.
Birthdays are spread out over a long period of time, which makes them cool for demonstrating several fun statistics concepts. One of them is the always-fun birthday paradox, which says that it only takes a random group 23 people before you'd expect at least a 50% chance of having a shared birthday between any two of them. This number probably seems absurdly low, but it's completely true. In fact, this is one of the many fun party games you can try that involved math!
If you're a little bit like me, sometimes you like to wish your friends happy birthday, and facebook is a handy tool for reminding you to do that. If you're a lot like me, you might notice that there are extreme variations in the number of friends who have a birthday on any given day.
Obviously if you have fewer than 365 Facebook friends you're guaranteed to have several days where nobody has birthdays. Once you get over 365 friends, though, you might expect that your friends' birthdays would even out, and that eventually you might run out of days with no birthdays.
But as you may have noticed, some days tend to have way more birthdays than others. For instance, 7 of my friends have birthdays on January 28th, and 6 of my friends have birthdays on January 5th. Does this mean that a bunch of my friends' parents got frisky in April?
Turns out... probably not at all. I have 546 Facebook friends who have their birthdays listed, meaning that on average I should expect 1.496 birthdays per day. Intuition would suggest I should get mostly 1-2 birthdays per day, with the occasional day with 0 and sometimes 3. Can the distribution of number of birthdays be predicted, though?
Yes! This sort of problem appears absolutely perfect for the Poisson Distribution. In fact, when I went through and jotted down the number of birthdays on each day of the week, this is what I got:
It's impressive how close the two are. Also, maybe counter to what you'd expect, a full 20% of the year has no birthdays. My Facebook friends fit the Poisson Distribution with a Coefficient of Determination of 0.984.
In reality, the fact that there are days where 6 or 7 of my friends have birthdays isn't exceptional and rare, but expected - it would actually be weird if there weren't any. In order to expect no days with 0 birthdays, you would have to have over 2,150 friends - expecting an average of almost 6 birthdays per day.
So that's your fun statistics thought of the day - whenever you have a large enough random sample, what normally might seem like an outlier actually tends to confirm just how random it really is.
Labels:
Science,
Statistics
Thursday, May 16, 2013
Homeopathy: Worse than "Just Water"
In 1796, German physician Samuel Hahnemann first proposed
the idea of homeopathy. Based on the concept that "like cures like",
homeopathic remedies attempt to cure symptoms suffered by patients by using
highly diluted concentrations of a substance that would normally cause the same
symptoms. However, these remedies are ineffective, wasteful, and can be
damaging if they stop people from seeking real medical attention.
At first glance, the principle behind homeopathy may not seem that far-fetched. For instance, some vaccines are basically just preparations made from a non-life-threatening version of a disease in order to prepare your body to fight off that disease, and vaccines are great. Homeopathy takes this concept – fighting an effect with its own cause - way too far though, well past the point of ridicule.
An actual example from the Canadian Society of Homeopaths is the use of onion juice as a remedy for hay fever. Onions cause runny noses and itchy eyes, and hay fever causes runny noses and itchy eyes. Homeopathy claims that since both of these cause the same symptoms, onion juice can also cure the symptoms of hay fever.
Let me reiterate: homeopathy claims that using more of something that causes your problems will end up curing your problems. They literally claim that two wrongs make a right. Similarly, they claim that oysters can cure indigestion, arsenic stops diarrhea, and mercury can cure chronic pain.
If homeopathy doesn't already seem ridiculous, it's about to. It's quite obviously not a good idea to ingest arsenic and mercury, and homeopathic remedies certainly wouldn't sell if they immediately killed the people who bought them. This is where the second major claim of homeopathy comes into play: the more a remedy is diluted, the more potent its healing powers will be.
While having the obvious advantage of avoiding killing people by directly poisoning them, the dilutions used in most homeopathic remedies don't help the plausibility of homeopathy as a practice. Homeopathic remedies are commonly prepared by performing a series of dilutions by a factor of 100, where the number of dilutions is referred to as the C number of the remedy. For example, they could take one millilitre of an ingredient, add it to 99 millilitres of water and shake it, then take one millilitre of that and add it to a new 99 millilitres of water, and get a 2C solution. The original number of dilutions proposed by Hahnemann was 30C - a series of thirty 1% dilutions. This is an equivalent ratio of one part active ingredient in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (1 novemdicillion) parts water.
For reference, a 6C homeopathic solution of salt water is supposed to help with irritability, but that's 40 billion times less concentrated than the salt in the ocean. The amount of liquid that's passed through an average 80 year old by the time they die dissolved in the combined water of all the world's lakes, oceans, and rivers is around 8C. What's intriguing is that if you started with a solution containing on mole of original material, at a dilution of 12C there's only a 60% chance of finding a single molecule of the active ingredient, and at a dilution of 13C, you're looking at pure water. A common over-the-counter homeopathic remedy at 30C isn't just essentially water - it is water.
The most popular homeopathic remedy sold is Ocsilloccinum, a 200C dilution of duck liver that is supposed to help with the flu. There are approximately 1080 atoms in the universe, so in order to have enough water to get one molecule of duck liver extract in a solution at this dilution, one would need 10320 universes worth of water. Somehow this is still considered a very potent concentration. Oscillococcinum is labelled as consisting of 0.85 g sucrose and 0.15 g lactose – these are 100% sugar pills.
Homeopathic practitioners can't easily argue with math, and most will readily admit that there is no active ingredient present in the remedies that they sell to patients. Instead, they claim that water has a "memory" and that the dilution and shaking process involved in preparing a homeopathic solution leaves an imprint on the water, thus changing its properties.
There is absolutely no evidence for this. Logically, it also doesn't hold up. As has been pointed out by countless comedians and scientists, it doesn't make any sense that water could remember the tiny amounts of poison once dissolved in it, but forget all the sewage it's come in contact with. Both the ability for water to retain an imprint of a chemical, and its ability to selectively forget other chemicals, would violate basic fundamental laws of physics.
So homeopathy is based on a principle that doesn't make sense, at concentrations that are nonexistent. Yet it is still around. In the United Kingdom, the Royal Homeopathic Hospital has the Queen as their official patron, and homeopathic remedies like Oscillococcinum in France are one of the most common treatments for the flu. Why?
One leading theory is that any perceived benefits of homeopathic water are likely due to a mix of confirmation bias and the placebo effect. Studies have shown that just the act of going to a clean hospital-like environment, listening to a compassionate doctor (or someone perceived to be a doctor), and taking something that you believe to be a medicine can often help with certain conditions. Placebos can have a measurable effect on improving health in certain circumstances, and are well-enough studied that we know a fake needle is more effective than a fake capsule, which is in turn more effective than a fake pill.
Confirmation bias likely also plays an important role in people's opinions on homeopathy. As an example, homeopaths describe a process known as 'homeopathic aggravation' - a temporary worsening of symptoms following a dose of a homeopathic remedy. When a patient takes the dose and then starts to feel worse, the homeopath can claim that's all part of the plan, reassuring the patient and convincing them that the remedy is working. Of course, an alternative explanation for homeopathic aggravation is self-evident - a patient notices they have a runny nose, takes some non-medicine, and then the rest of the flu hits and they claim that's the 'aggravation'. Well, no - that's just how diseases work when they aren't subject to actual medicine.
After hearing these arguments, a homeopath may ask to agree to disagree and claim that since the remedies are so dilute they can't possibly have side effects, but they bring comfort and maybe benefits to the people who buy them, surely there’s no harm in offering it as an alternative medicine.
The harm comes when people spend money on homeopathic remedies, mistaking them for real medicine, instead of going to real doctors. The placebo effect can have noticeable effects on health, but isn't generally capable of curing cancer. Patients with serious medical conditions who forego proven treatments from real doctors put themselves at severe risks they otherwise wouldn't need to experience, much like those who choose to avoid vaccines or sunscreen.
Homeopathic remedies do not need to pass the same testing requirements as real medicine to be sold in Canada, but they can still be sold deceivingly on a shelf in a grocery store beside real medicine as though they are equally valid options. Someone desperate for symptom relief who doesn't know any better could easily mistakenly be buying sugar pills and pure water, costing them money and delaying real treatment. This is where the real harm of homeopathy comes from.
Drug regulations require strict testing for a reason, and if homeopathy cannot provide evidence that it works or at least a plausible working mechanism then it should not be portrayed and sold commercially as an equally viable form of treatment.
At first glance, the principle behind homeopathy may not seem that far-fetched. For instance, some vaccines are basically just preparations made from a non-life-threatening version of a disease in order to prepare your body to fight off that disease, and vaccines are great. Homeopathy takes this concept – fighting an effect with its own cause - way too far though, well past the point of ridicule.
An actual example from the Canadian Society of Homeopaths is the use of onion juice as a remedy for hay fever. Onions cause runny noses and itchy eyes, and hay fever causes runny noses and itchy eyes. Homeopathy claims that since both of these cause the same symptoms, onion juice can also cure the symptoms of hay fever.
Let me reiterate: homeopathy claims that using more of something that causes your problems will end up curing your problems. They literally claim that two wrongs make a right. Similarly, they claim that oysters can cure indigestion, arsenic stops diarrhea, and mercury can cure chronic pain.
If homeopathy doesn't already seem ridiculous, it's about to. It's quite obviously not a good idea to ingest arsenic and mercury, and homeopathic remedies certainly wouldn't sell if they immediately killed the people who bought them. This is where the second major claim of homeopathy comes into play: the more a remedy is diluted, the more potent its healing powers will be.
While having the obvious advantage of avoiding killing people by directly poisoning them, the dilutions used in most homeopathic remedies don't help the plausibility of homeopathy as a practice. Homeopathic remedies are commonly prepared by performing a series of dilutions by a factor of 100, where the number of dilutions is referred to as the C number of the remedy. For example, they could take one millilitre of an ingredient, add it to 99 millilitres of water and shake it, then take one millilitre of that and add it to a new 99 millilitres of water, and get a 2C solution. The original number of dilutions proposed by Hahnemann was 30C - a series of thirty 1% dilutions. This is an equivalent ratio of one part active ingredient in 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (1 novemdicillion) parts water.
For reference, a 6C homeopathic solution of salt water is supposed to help with irritability, but that's 40 billion times less concentrated than the salt in the ocean. The amount of liquid that's passed through an average 80 year old by the time they die dissolved in the combined water of all the world's lakes, oceans, and rivers is around 8C. What's intriguing is that if you started with a solution containing on mole of original material, at a dilution of 12C there's only a 60% chance of finding a single molecule of the active ingredient, and at a dilution of 13C, you're looking at pure water. A common over-the-counter homeopathic remedy at 30C isn't just essentially water - it is water.
The most popular homeopathic remedy sold is Ocsilloccinum, a 200C dilution of duck liver that is supposed to help with the flu. There are approximately 1080 atoms in the universe, so in order to have enough water to get one molecule of duck liver extract in a solution at this dilution, one would need 10320 universes worth of water. Somehow this is still considered a very potent concentration. Oscillococcinum is labelled as consisting of 0.85 g sucrose and 0.15 g lactose – these are 100% sugar pills.
Homeopathic practitioners can't easily argue with math, and most will readily admit that there is no active ingredient present in the remedies that they sell to patients. Instead, they claim that water has a "memory" and that the dilution and shaking process involved in preparing a homeopathic solution leaves an imprint on the water, thus changing its properties.
There is absolutely no evidence for this. Logically, it also doesn't hold up. As has been pointed out by countless comedians and scientists, it doesn't make any sense that water could remember the tiny amounts of poison once dissolved in it, but forget all the sewage it's come in contact with. Both the ability for water to retain an imprint of a chemical, and its ability to selectively forget other chemicals, would violate basic fundamental laws of physics.
So homeopathy is based on a principle that doesn't make sense, at concentrations that are nonexistent. Yet it is still around. In the United Kingdom, the Royal Homeopathic Hospital has the Queen as their official patron, and homeopathic remedies like Oscillococcinum in France are one of the most common treatments for the flu. Why?
One leading theory is that any perceived benefits of homeopathic water are likely due to a mix of confirmation bias and the placebo effect. Studies have shown that just the act of going to a clean hospital-like environment, listening to a compassionate doctor (or someone perceived to be a doctor), and taking something that you believe to be a medicine can often help with certain conditions. Placebos can have a measurable effect on improving health in certain circumstances, and are well-enough studied that we know a fake needle is more effective than a fake capsule, which is in turn more effective than a fake pill.
Confirmation bias likely also plays an important role in people's opinions on homeopathy. As an example, homeopaths describe a process known as 'homeopathic aggravation' - a temporary worsening of symptoms following a dose of a homeopathic remedy. When a patient takes the dose and then starts to feel worse, the homeopath can claim that's all part of the plan, reassuring the patient and convincing them that the remedy is working. Of course, an alternative explanation for homeopathic aggravation is self-evident - a patient notices they have a runny nose, takes some non-medicine, and then the rest of the flu hits and they claim that's the 'aggravation'. Well, no - that's just how diseases work when they aren't subject to actual medicine.
After hearing these arguments, a homeopath may ask to agree to disagree and claim that since the remedies are so dilute they can't possibly have side effects, but they bring comfort and maybe benefits to the people who buy them, surely there’s no harm in offering it as an alternative medicine.
The harm comes when people spend money on homeopathic remedies, mistaking them for real medicine, instead of going to real doctors. The placebo effect can have noticeable effects on health, but isn't generally capable of curing cancer. Patients with serious medical conditions who forego proven treatments from real doctors put themselves at severe risks they otherwise wouldn't need to experience, much like those who choose to avoid vaccines or sunscreen.
![]() |
| Homeopathy sold alongside real medicine, courtesy of your local Safeway. |
Homeopathic remedies do not need to pass the same testing requirements as real medicine to be sold in Canada, but they can still be sold deceivingly on a shelf in a grocery store beside real medicine as though they are equally valid options. Someone desperate for symptom relief who doesn't know any better could easily mistakenly be buying sugar pills and pure water, costing them money and delaying real treatment. This is where the real harm of homeopathy comes from.
Drug regulations require strict testing for a reason, and if homeopathy cannot provide evidence that it works or at least a plausible working mechanism then it should not be portrayed and sold commercially as an equally viable form of treatment.
Labels:
Science,
The Wanderer
Monday, May 13, 2013
NHL Playoffs: Two Weeks In
Hey there!
The playoffs have been going on for two weeks now, and I am pleasantly surprised to say that they've been going pretty well in terms of what my model has output. In fact, of the six series that have wrapped up so far, the team that won each one of them was given the highest probability by my model. For instance, my model originally gave the following:
Blackhawks (77.0%) to beat Wild (23.0%)
Red Wings (59.1%) to beat Ducks (40.9%)
Sharks (62.2%) to beat Canucks (37.8%)
Kings (56.9%) to beat Blues (43.1%)
Penguins (64.2%) to beat Islanders (35.8%)
Senators (84.1%) to beat Canadiens (15.9%)
It also predicted the following at the outset:
Rangers (62.1%) to beat Capitals (37.9%)
Bruins (76.7%) to beat Maple Leafs (23.3%)
These last two series will be wrapped up tonight, and hopefully I can keep my success streak up. Currently, though, with a 6-0 record I am very pleased with the model so far. Wish me luck!
Today's post is gonna look a little bit about some of the behind-the-scenes math that goes into this model.
What's really important is to be able to take the odds of winning an individual game and convert those into the odds of winning the series as a whole. Fortunately this can be done pretty easily using a binomial distribution.
It turns out that there are a grand total of 70 ways for a best 4 out of 7 series to work out. They break down as follows:

The way that I've set up my model allows for the number of games previously won to factor into the probability for the series, which is convenient for allowing the model to update every day following the results from the previous nights' games. The effect of having a game in hand looks something like this:
One other factor that could have an effect is home team advantage. The series get close to balancing out the number of home games between the two teams, but whenever a series ends on an odd number of games the team who had the first home game ought to have an advantage since they've had more home games, right?
Looking at the last 3 seasons of the NHL, 54.55% of games are won by the home team and 45.45% of games are won by the away team. If we factor this into the model, we get something like this:
Well that's not much of an advantage at all, is it? Probably a good thing.
So there you go. See you again next week!
The playoffs have been going on for two weeks now, and I am pleasantly surprised to say that they've been going pretty well in terms of what my model has output. In fact, of the six series that have wrapped up so far, the team that won each one of them was given the highest probability by my model. For instance, my model originally gave the following:
Blackhawks (77.0%) to beat Wild (23.0%)
Red Wings (59.1%) to beat Ducks (40.9%)
Sharks (62.2%) to beat Canucks (37.8%)
Kings (56.9%) to beat Blues (43.1%)
Penguins (64.2%) to beat Islanders (35.8%)
Senators (84.1%) to beat Canadiens (15.9%)
It also predicted the following at the outset:
Rangers (62.1%) to beat Capitals (37.9%)
Bruins (76.7%) to beat Maple Leafs (23.3%)
These last two series will be wrapped up tonight, and hopefully I can keep my success streak up. Currently, though, with a 6-0 record I am very pleased with the model so far. Wish me luck!
Today's post is gonna look a little bit about some of the behind-the-scenes math that goes into this model.
What's really important is to be able to take the odds of winning an individual game and convert those into the odds of winning the series as a whole. Fortunately this can be done pretty easily using a binomial distribution.
It turns out that there are a grand total of 70 ways for a best 4 out of 7 series to work out. They break down as follows:
- 2 ways for a 4-0 (or 0-4) shut down (12.5% chance if teams are even)
- 8 ways for a 4-1 or 1-4 finish (25.0%)
- 20 ways for 4-2 or 2-4 (31.25%)
- 20 ways for 4-3 or 3-4 (31.25%)

The way that I've set up my model allows for the number of games previously won to factor into the probability for the series, which is convenient for allowing the model to update every day following the results from the previous nights' games. The effect of having a game in hand looks something like this:
One other factor that could have an effect is home team advantage. The series get close to balancing out the number of home games between the two teams, but whenever a series ends on an odd number of games the team who had the first home game ought to have an advantage since they've had more home games, right?
Looking at the last 3 seasons of the NHL, 54.55% of games are won by the home team and 45.45% of games are won by the away team. If we factor this into the model, we get something like this:
Well that's not much of an advantage at all, is it? Probably a good thing.
So there you go. See you again next week!
Labels:
Hockey,
Statistics
Monday, May 6, 2013
NHL Playoff Predictions
The NHL playoffs are upon us, and for the third time I'm dusting off my Excel playoff model to see if I can predict who's going to win.
As it stands (as of May 6th, 2013), my model predicts that the most likely final will be between the Ottawa Senators and the Chicago Blackhawks. Altogether, though, the top four teams are the Senators, Blackhawks, Bruins, and Sharks (collectively these account for a 77% chance of winning the whole thing).
One of the ways that I've been presenting the daily updates from the model is as follows:
As time progresses (along the bottom), the height of each colored segment represents the relative probability of that team winning. For instance, when the Senators lost on May 3nd, their bar shrunk noticeably, and grew again after they won on the 5th. Again, the Bruins, Senators, Blackhawks, and Sharks account for a massive amount of the graph (and hopefully don't lose on the first round... that would be awkward).
So what makes me think I'm anywhere near accurate? If you asked me in person, I'd scratch my head and shrug a little. Particularly concerning are the long odds offered to some of the teams I predict to have a good chance of winning offered by sites like SportsClubStats and Bet365.
There are a couple of suggestions that I'm not totally inaccurate, though. Here are some of the results from previous years:
2010: Only correctly predicted the Blackhawks halfway after they started leading in the semi-finals. Maybe not the best prediction...
2012: Predicted the Kings six weeks before they won, once the Blues started to slide a little bit. More surprised about the Eastern conference, though, where the Devils admittedly were not predicted to do all that well.
Of course, the toughest part when it comes to checking how accurate a model is is actually coming up with an objective way of measuring that accuracy. Sure, the Kings won last year, but they only had a 13.5% chance starting out. 13.5% is high relative to other teams, but not really all that great overall. Can I really call it a win that a team with a 13.5% chance to win at the outside beat a bunch of teams at 5-10%?
One way to evaluate accuracy is to use a Brier score for each team, and take an average of all of them over time. A slightly modified Brier score would give a score of 1.0 to a 100% prediction that comes true, and 0.0 if it fails, with various decimal values in between based on what the given prediction was beforehand. If we compare the results from last year's model to what we would expect from pure chance, we get this:
So that's cool. Almost the whole way throughout the playoffs last year, my model gave more precise estimates of who's going to win than chance (assuming every game has a 50-50 chance of going either way). Part of the reason the score is so high near the end is that some teams have already been eliminated, and therefore would have a "perfect" prediction score (even though that's a bit silly). If we remove these teams, we get something more like this:
There are three distinct dips in the graph that represent the end of each round of playoffs. The scores dip because the predictions would get more general (open-ended playoff series making things less predictable, etc.). Even accounting for all this, my model last year was still significantly and consistently above chance. Fancy!
So who knows if the Senators will actually win. It'd be pretty cool if they did, though...
As it stands (as of May 6th, 2013), my model predicts that the most likely final will be between the Ottawa Senators and the Chicago Blackhawks. Altogether, though, the top four teams are the Senators, Blackhawks, Bruins, and Sharks (collectively these account for a 77% chance of winning the whole thing).
One of the ways that I've been presenting the daily updates from the model is as follows:
As time progresses (along the bottom), the height of each colored segment represents the relative probability of that team winning. For instance, when the Senators lost on May 3nd, their bar shrunk noticeably, and grew again after they won on the 5th. Again, the Bruins, Senators, Blackhawks, and Sharks account for a massive amount of the graph (and hopefully don't lose on the first round... that would be awkward).
So what makes me think I'm anywhere near accurate? If you asked me in person, I'd scratch my head and shrug a little. Particularly concerning are the long odds offered to some of the teams I predict to have a good chance of winning offered by sites like SportsClubStats and Bet365.
There are a couple of suggestions that I'm not totally inaccurate, though. Here are some of the results from previous years:
2010: Only correctly predicted the Blackhawks halfway after they started leading in the semi-finals. Maybe not the best prediction...
2012: Predicted the Kings six weeks before they won, once the Blues started to slide a little bit. More surprised about the Eastern conference, though, where the Devils admittedly were not predicted to do all that well.
Of course, the toughest part when it comes to checking how accurate a model is is actually coming up with an objective way of measuring that accuracy. Sure, the Kings won last year, but they only had a 13.5% chance starting out. 13.5% is high relative to other teams, but not really all that great overall. Can I really call it a win that a team with a 13.5% chance to win at the outside beat a bunch of teams at 5-10%?
One way to evaluate accuracy is to use a Brier score for each team, and take an average of all of them over time. A slightly modified Brier score would give a score of 1.0 to a 100% prediction that comes true, and 0.0 if it fails, with various decimal values in between based on what the given prediction was beforehand. If we compare the results from last year's model to what we would expect from pure chance, we get this:
So that's cool. Almost the whole way throughout the playoffs last year, my model gave more precise estimates of who's going to win than chance (assuming every game has a 50-50 chance of going either way). Part of the reason the score is so high near the end is that some teams have already been eliminated, and therefore would have a "perfect" prediction score (even though that's a bit silly). If we remove these teams, we get something more like this:
There are three distinct dips in the graph that represent the end of each round of playoffs. The scores dip because the predictions would get more general (open-ended playoff series making things less predictable, etc.). Even accounting for all this, my model last year was still significantly and consistently above chance. Fancy!
So who knows if the Senators will actually win. It'd be pretty cool if they did, though...
Labels:
Hockey,
Statistics
Friday, March 22, 2013
The Economics of 50/50 Draws
"Boy," you might be saying, "this blog has sure posted a lot this month!" You're right! In fact, March so far has had at least twice as many page views as any other month in the existence of this blog. Figuring that this is about as successful as this is ever going to get, I figured I'll just keep posting while the going is hot.
Recently I attended a fair number of arena curling games. People who attend arena curling games often enjoy things like expensive beer, ridiculously addictive popcorn, curling (sometimes), and the 50/50 draw.
A fun way to think about casino games and lotteries (if you're me) is based on their expected return (ok, it's actually not fun at all).
Take Roulette, for instance. If you pick a solid colour in Roulette you have an 18/38 chance of winning, where winning would double your money. Doubling your money isn't quite good enough to break even, though, because 18/38 is slightly less than half. As a result, for every dollar you spend on Roulette, you'd expect to lose 5.26 cents. This is the house edge for Roulette, and what ensures that the casino always comes out on top.
Other common casino games have house edges like 1.41% (pass line in craps), 1.06% (banker bet in Baccarat), and 0.43% (perfect play in Blackjack without counting cards). Slot machines will often run house edges ranging from 7%-15%.
Lotteries are a little bit different. Lotto 6/49, for instance, runs a house edge of about 30%, Keno gambling is approximately 30%-40% depending on the rules, and I suspect that sports betting using SportSelect can get as high as 30%-50%.
Now that we have a reference point, we can compare 50/50 lotteries to these other games. In a basic 50/50 game, everyone could buy $5 tickets, and one person would win 50% of all the money paid into the lottery. Buying one ticket would give you a 1/n chance of winning, and you would win an amount of money equivalent to n/2 time the price per ticket, at a cost of entry of that same price per ticket. As a result, your expected loss per ticket purchased is 50%. This is way worse than pretty much anything else.
Things get a little bit more complicated though. Many arenas nowadays offer the option of paying $5 for 1 ticket, $10 for 3 tickets, or $20 for 10 tickets. Apart from the obvious differences in price per ticket (making $20 for 10 seem a substantially better deal already), how do these different options translate into house edges?
Fortunately I made another set obscure and hard-to-read ternary plots for ya! Check it out:
Due to me not thinking before colouring, the axes are a bit backwards. Along the bottom is the fraction of sales for the $5/1 combo, along the right hand side is the fraction of sales for the $10/3 combo, and along the left hand side is the fraction of sales for the $20/10 combo. If you've never read one of these, check out this cool website on how to do so.
In general, the house edge depends on what everyone else buys, which makes sense. It's interesting just how large this effect is, though. Buying a single ticket for $5 ranges from a house edge of 50% (if everyone else does too) to 80% (if everyone else buys the best combo).
Buying three tickets for $10 ranges from a house edge of 25%-70%, and buying ten tickets for $20 ranges from a house edge of 50% to a player edge of 25%. The player edge, of course, only occurs if you are one of a very tiny number of people buying the $20/10 combo.
Though I can't find any sources for the actual distribution at sporting events, if we had an even split in sales then we'd be looking at a house edge ranging from 40%-75%, depending on which package is purchased. This is by far the worst set of house odds out of any of the games previously mentioned.
50/50 draws are maybe justified in the sense that the money primarily goes to charities, but as an investment (or even just a source of gambling for fun) they're really probably one of the worst things you can do.
See ya!
Recently I attended a fair number of arena curling games. People who attend arena curling games often enjoy things like expensive beer, ridiculously addictive popcorn, curling (sometimes), and the 50/50 draw.
A fun way to think about casino games and lotteries (if you're me) is based on their expected return (ok, it's actually not fun at all).
Take Roulette, for instance. If you pick a solid colour in Roulette you have an 18/38 chance of winning, where winning would double your money. Doubling your money isn't quite good enough to break even, though, because 18/38 is slightly less than half. As a result, for every dollar you spend on Roulette, you'd expect to lose 5.26 cents. This is the house edge for Roulette, and what ensures that the casino always comes out on top.
Other common casino games have house edges like 1.41% (pass line in craps), 1.06% (banker bet in Baccarat), and 0.43% (perfect play in Blackjack without counting cards). Slot machines will often run house edges ranging from 7%-15%.
Lotteries are a little bit different. Lotto 6/49, for instance, runs a house edge of about 30%, Keno gambling is approximately 30%-40% depending on the rules, and I suspect that sports betting using SportSelect can get as high as 30%-50%.
Now that we have a reference point, we can compare 50/50 lotteries to these other games. In a basic 50/50 game, everyone could buy $5 tickets, and one person would win 50% of all the money paid into the lottery. Buying one ticket would give you a 1/n chance of winning, and you would win an amount of money equivalent to n/2 time the price per ticket, at a cost of entry of that same price per ticket. As a result, your expected loss per ticket purchased is 50%. This is way worse than pretty much anything else.
Things get a little bit more complicated though. Many arenas nowadays offer the option of paying $5 for 1 ticket, $10 for 3 tickets, or $20 for 10 tickets. Apart from the obvious differences in price per ticket (making $20 for 10 seem a substantially better deal already), how do these different options translate into house edges?
Fortunately I made another set obscure and hard-to-read ternary plots for ya! Check it out:
Due to me not thinking before colouring, the axes are a bit backwards. Along the bottom is the fraction of sales for the $5/1 combo, along the right hand side is the fraction of sales for the $10/3 combo, and along the left hand side is the fraction of sales for the $20/10 combo. If you've never read one of these, check out this cool website on how to do so.
In general, the house edge depends on what everyone else buys, which makes sense. It's interesting just how large this effect is, though. Buying a single ticket for $5 ranges from a house edge of 50% (if everyone else does too) to 80% (if everyone else buys the best combo).
Buying three tickets for $10 ranges from a house edge of 25%-70%, and buying ten tickets for $20 ranges from a house edge of 50% to a player edge of 25%. The player edge, of course, only occurs if you are one of a very tiny number of people buying the $20/10 combo.
Though I can't find any sources for the actual distribution at sporting events, if we had an even split in sales then we'd be looking at a house edge ranging from 40%-75%, depending on which package is purchased. This is by far the worst set of house odds out of any of the games previously mentioned.
50/50 draws are maybe justified in the sense that the money primarily goes to charities, but as an investment (or even just a source of gambling for fun) they're really probably one of the worst things you can do.
See ya!
Labels:
Statistics
Subscribe to:
Posts (Atom)










