Wednesday, May 22, 2013

Les Poissons, Les Poissons

(hee hee hee, haw haw haw)

In WWII, the British were getting regularly bombed by V-1 and V-2 rockets. After a while, it turned out that certain neighborhoods of London were getting pelted a ton, while others weren't. Some people started to suspect that the Germans had exceptionally accurate bombing capabilities, and were avoiding neighborhoods where their spies lived.

In order to figure out what was going on, British statistician R. D. Clarke broke the city into a grid and counted the number of hits per area. His results almost perfectly matched what you'd expect from random chance, following what is known as the Poisson Distribution.

The Poisson Distribution is really good for figuring out the likelihood of a given number of events occurring over a fixed time or space, based only on an expected number of the same events. Clarke took the number of rocket hits and divided it by his number of grids produced, which suggested an expected average approximately 0.933 rockets per grid space. The Poisson Distribution for this suggests about 40% of London wouldn't ever actually be bombed, while almost 25% of London would get bombed two or more times, and this is exactly what happened. It turned out that the Nazis weren't accurate at all, and it was just random chance that determined who would get hit on any particular day.

Now a happier topic: happy birthday!

On the off-chance that it actually is your birthday and you're reading this, then this is quite the fortuitous event. Lucky me! Go have some cake instead of reading my blog, sillypants.

Birthdays are spread out over a long period of time, which makes them cool for demonstrating several fun statistics concepts. One of them is the always-fun birthday paradox, which says that it only takes a random group 23 people before you'd expect at least a 50% chance of having a shared birthday between any two of them. This number probably seems absurdly low, but it's completely true. In fact, this is one of the many fun party games you can try that involved math!

If you're a little bit like me, sometimes you like to wish your friends happy birthday, and facebook is a handy tool for reminding you to do that. If you're a lot like me, you might notice that there are extreme variations in the number of friends who have a birthday on any given day.

Obviously if you have fewer than 365 Facebook friends you're guaranteed to have several days where nobody has birthdays. Once you get over 365 friends, though, you might expect that your friends' birthdays would even out, and that eventually you might run out of days with no birthdays.

But as you may have noticed, some days tend to have way more birthdays than others. For instance, 7 of my friends have birthdays on January 28th, and 6 of my friends have birthdays on January 5th. Does this mean that a bunch of my friends' parents got frisky in April?

Turns out... probably not at all. I have 546 Facebook friends who have their birthdays listed, meaning that on average I should expect 1.496 birthdays per day. Intuition would suggest I should get mostly 1-2 birthdays per day, with the occasional day with 0 and sometimes 3. Can the distribution of number of birthdays be predicted, though?

Yes! This sort of problem appears absolutely perfect for the Poisson Distribution. In fact, when I went through and jotted down the number of birthdays on each day of the week, this is what I got:


It's impressive how close the two are. Also, maybe counter to what you'd expect, a full 20% of the year has no birthdays. My Facebook friends fit the Poisson Distribution with a Coefficient of Determination of 0.984.

In reality, the fact that there are days where 6 or 7 of my friends have birthdays isn't exceptional and rare, but expected - it would actually be weird if there weren't any. In order to expect no days with 0 birthdays, you would have to have over 2,150 friends - expecting an average of almost 6 birthdays per day.

So that's your fun statistics thought of the day - whenever you have a large enough random sample, what normally might seem like an outlier actually tends to confirm just how random it really is.

No comments: