## Wednesday, January 2, 2013

### When Life Gives you Weather Stats...

So here's my dilemma. I have this 'weblog', and it's really cool when people read it. (Not quite as cool as before, because apparently somebody spam-clicked my ads and now I no longer get money. On the other hand - no ads!) By far the most popular posts are when I talk about weather stuff or SU stuff, and as there are no present SU elections to write about and I've been doing weather posts at the end of each season, I may have nothing good to offer for this week.

Instead, I'm going to make statsonade from the stats that life tossed me. Yum!

In my last post, I presented the weather forecast comparison I had for six weather stations in Edmonton during autumn. It was pretty fun, and only one of the six forecasters outright told me I was wrong.

One of the accuracy measures I used in that analysis was the percentage of time that a forecaster was within three degrees of the true high temperature. An alternative way of presenting these results would be to just outright plot the predicted vs. actual temperature results for each station. Maybe it would look something like this:

There are some wicked fun facts from these graphs. In all of them, the red line is perfection, where what you predict is exactly what you end up with. The data was only presented for autumn, and stations are all quite close to perfect, as well as close to each other - the R-squared values range from 0.900 to 0.926. In general it appears as though most of the stations over-predict the temperature when it gets to the higher range

That's all well and good, but what if we wanted to take this a step further? Is there some combination of  stations that gets you better than any individual station? That would be like a weather model or something.

It turns out that you can actually get a marginally better prediction by using a weighted average of the stations. Consider the following:

T = 0.085TAD + 0.301EC + 0.148GB + 0.155WN + 0.483WC - 0.172CTV

After all that work, our R-squared value is a whopping 0.944. Though this method of aggregating weather forecasts is apparently a minor improvement, it's likely not worth it in terms of predicting the weather.

A fun result of the regression suggests that it would be easier to just take a weighted average of Environment Canada and Weather Channel's predictions, as they make up the majority of the formula. What's really strange, though, is that the CTV predictions get factored in as a negative value. CTV itself has a completely respectable correlation between predicted and real temperatures, but for some reason subtracting a weighted version of their numbers improves the overall prediction (when using them and at least two of any other weather station). Mysterious...