Friday, August 7, 2015

Fortune Favours the Old

I had my birthday recently and, much like last year, got a joke present from a friend. This year though, it came with an explicit challenge to do something statistical with it. So for this blog post, my subject matter will be based on this box of fortune cookies:

My willing victim

So what stats can be pulled out of a box of fortune cookies? First of all, I suppose the box says there are approximately 25 cookies, but in reality it came with 38 fortunes. Ridiculous quality control, let me tell you!

Of course, most fortunes just floated around without cookies
Fortunately, each fortune has a set of numbers on the back. Numbers are good, so let's do stats with those and leave the yummy cookie bits for later.

Fortunes not necessarily to scale.
Each fortune has a series of 6 ascending, non-repeating integers on the back. Presumably these are lucky numbers for your next lottery, but given just this set of numbers we can't necessarily tell which lottery they might be meant for. But can we make an educated guess?

Quick history lesson: in World War II, the Allies were at least somewhat concerned with estimating how many tanks Germany was building in any given month. One way they had was conventional espionage, which suggested that the Germans were building approximately 1,400 tanks every month between June 1940 and September 1942 (a lot of tanks). Of course, spies sometimes lie (it's their job, after all), so the second way the Allies had to estimate tank production was using statistics on captured tanks.

Every tank had a whole bunch of parts, and every part had a serial number stamped into it during production. These serial numbers were unique for every tank, and in the case of the gearboxes in particular, fell in unbroken sequences. Based on the distribution of serial numbers, a relatively simple formula could give an estimate of the total number of tanks produced. For instance, if the Allies saw that the tanks they destroyed in a given month were tanks produced #25, #94, #141, and #198 of that month (and were confident they were destroying them randomly), they'd be much less worried than if they destroyed tanks #52, #306, #519, and #1058.

It actually turned out way more accurate than anyone hoped - statistical estimates for tank production between June 1940 and September 1942 were 246 tanks per month, and in reality the Germans produced 245. Yay stats!

So like the famous German tank problem, looking at a fortune cookie's string of numbers can give us an estimate of the total number of 'lucky numbers' that the fortune cookies might offer. In the above example, there are 6 numbers decently evenly spaced between 2 and 47. A frequentist statistical approach, therefore, suggests that the total number of possible numbers that could be on the backs of these fortune cookies is 53.83, with a 95% confidence interval of 47-77. Not terribly precise when looking at a single fortune. Another fortune might have a series of numbers with a likely maximum of 48, for instance, and if we look at the average of all 38 fortunes in the box, the average 'expected number' ends up being 49.4. And in fact, of all 38 fortunes, all numbers on the backs were between 1 and 49.

So we have six numbers, chosen between 1-49. Sounds like we're playing Lotto 6/49!

Here's what the distribution of all lucky numbers ended up being:

It kinda looks like number 37 comes up way more often than the rest, and numbers 9 and 13 are super under-represented. Is this a conspiracy, or random chance?

With 49 numbers to choose from, 6 different numbers on each fortune, and 38 fortunes to choose from, we'd expect an average of 4.65 of each number to show up. With an expected 4.65 of each number, we can create a Poisson distribution to see how often we'd expect any given number to turn up, and see if ours is indeed random. That'd give us something like this:

This suggests that the distribution of lucky numbers isn't actually all that lucky, and may be pretty much what you'd expect (R2 value of 0.82, which ain't shabby). It matches particularly closely at the tails, so having a few numbers occur 10 times each isn't all that surprising really.

One last analysis for the fortune cookies. Fortunes tended to come in one of three categories: advice ("Counting time is not as important as making time count"), analysis ("You are deeply attached to your family and home"), and most popularly predictions ("You will soon find something lost long ago"). Is there any relation between the type of fortune on the front, and the sum of the numbers on the back?

Nope, nothing statistically significant anyway. The Analysis fortunes seem to generally have higher numbers on the back, but there are too few of them and they are too varied to be conclusive.

So there you go! Fortune cookies tend to have Lotto 6/49 numbers on the back that are fairly well randomly distributed. Not sure if that left any of you particularly surprised, but it's fun to know nonetheless!

1 comment:

Sydney Hermanson said...

This is a laborious post, you done a lot of work. Using cookie predictions for the lottery numbers is something new I have not come across yet. Hopt this will work and the numbers will appear in Thunderball results UK. Thanks.