(hee hee hee, haw haw haw)

In WWII, the British were getting regularly bombed by V-1
and V-2 rockets. After a while, it turned out that certain neighborhoods
of London were getting pelted a ton, while others weren't. Some people
started to suspect that the Germans had exceptionally accurate bombing
capabilities, and were avoiding neighborhoods where their spies lived.

In
order to figure out what was going on, British statistician R. D.
Clarke broke the city into a grid and counted the number of hits per
area. His results almost perfectly matched what you'd expect from random chance, following what is known as the Poisson Distribution.

The
Poisson Distribution is really good for figuring out the likelihood of a
given number of events occurring over a fixed time or space, based only
on an expected number of the same events. Clarke took the number of
rocket hits and divided it by his number of grids produced, which
suggested an expected average approximately 0.933 rockets per grid
space. The Poisson Distribution for this suggests about 40% of London
wouldn't ever actually be bombed, while almost 25% of London would get
bombed two or more times, and this is exactly what happened. It turned
out that the Nazis weren't accurate at all, and it was just random
chance that determined who would get hit on any particular day.

Now a happier topic: happy birthday!

On
the off-chance that it actually is your birthday and you're reading
this, then this is quite the fortuitous event. Lucky me! Go have some
cake instead of reading my blog, sillypants.

Birthdays
are spread out over a long period of time, which makes them cool for
demonstrating several fun statistics concepts. One of them is the
always-fun birthday paradox,
which says that it only takes a random group 23 people before you'd
expect at least a 50% chance of having a shared birthday between any two
of them. This number probably seems absurdly low, but it's completely
true. In fact, this is one of the many fun party games you can try that
involved math!

If you're a little bit like me,
sometimes you like to wish your friends happy birthday, and facebook is a
handy tool for reminding you to do that. If you're a lot like me, you
might notice that there are extreme variations in the number of friends
who have a birthday on any given day.

Obviously if you have fewer than 365 Facebook friends you're guaranteed to have several days where nobody has birthdays.
Once you get over 365 friends, though, you might expect that your
friends' birthdays would even out, and that eventually you might run out
of days with no birthdays.

But as you may have
noticed, some days tend to have way more birthdays than others. For
instance, 7 of my friends have birthdays on January 28th, and 6 of my
friends have birthdays on January 5th. Does this mean that a bunch of my
friends' parents got frisky in April?

Turns out...
probably not at all. I have 546 Facebook friends who have their
birthdays listed, meaning that on average I should expect 1.496
birthdays per day. Intuition would suggest I should get mostly 1-2
birthdays per day, with the occasional day with 0 and sometimes 3. Can
the distribution of number of birthdays be predicted, though?

Yes! This sort of problem appears absolutely perfect for the Poisson Distribution. In fact, when I went through and jotted down the number of birthdays on each day of the week, this is what I got:

It's
impressive how close the two are. Also, maybe counter to what you'd
expect, a full 20% of the year has no birthdays. My Facebook friends fit
the Poisson Distribution with a Coefficient of Determination of 0.984.

In
reality, the fact that there are days where 6 or 7 of my friends have
birthdays isn't exceptional and rare, but expected - it would actually
be weird if there weren't any. In order to expect no days with 0
birthdays, you would have to have over 2,150 friends - expecting an
average of almost 6 birthdays per day.

So that's
your fun statistics thought of the day - whenever you have a large
enough random sample, what normally might seem like an outlier actually
tends to confirm just how random it really is.

## No comments:

Post a Comment