Monday, September 24, 2012

Summer Weather

Weather forecasting is insane.

As a career I couldn't even imagine how un-rewarding it is - you could pour hours and hours into developing new algorithms that only get tiny increases in accuracy due simply to the massive complexity of the system you're trying to model. When you're right people take you for granted, and when you're wrong you take a lot of blame.

That being said, a while ago I noticed that sometimes different weather forecasters will predict radically different weather for the same day, given the same data. Also, I noticed that on Monday the weather for the weekend could be substantially different than the forecast from Friday. These are all fair differences - tweaks to models could cause differences of opinions between meteorologists, and the closer your prediction is to when you make it, the more accurate we'd hope it would be.

I was curious as to how much of a change there would be, though, which is why I decided to keep track of it. Since the beginning of June I've kept track of the six-day forecasts for High temperature, Low temperature, and Probability of Precipitation for five different forecasting stations:, Environment Canada, Global Weather, the Weather Network, and the Weather Channel. Environment Canada, Global, and the Weather Network were chosen based on the sites visited most frequently by myself and my friends, the Weather Channel was chosen as it is the basis of Yahoo! weather, and subsequently the commonly-used Apple weather app, and was chosen because it's a large multinational site. All stations were chosen at the Edmonton downtown location, not the international airport, and data for predictions was collected between 11 and 12 am for consistency in comparison.

Now that summer's over, I have some preliminary results. And the winner (by a hair) is the Weather Network!

Score (out of 100):
  • Weather Network: 66.92
  • Global Weather: 66.02
  • Weather Channel: 63.99
  • Environment Canada: 55.00
  • 54.25
The score is based on a weighted average that was more or less arbitrarily decided by me: each subsequent day in the future was weighted less (so that a prediction for tomorrow's weather is worth more than a prediction for next week's), and POP was worth more than the High prediction, which was in turn weighted more than the Low prediction.

Some fun facts!

Best High temperature prediction: Weather Channel 1-day prediction (96.79% within 3 degrees)
Best Low temperature prediction: Environment Canada 2-day prediction (96.07% within 3 degrees)
Best POP: Global 4-day prediction (p-value 0.346)

Worst High temperature prediction: TimeandDate 6-day prediction (55.20% within 3 degrees)
Worst Low temperature prediction: Global 6-day prediction (68.57% within 3 degrees)
Worst POP: TimeandDate 3-day prediction (p-value 0.038)

Some graphs!

Temperature score was based on the percentage of predictions that were within 3 degrees of the actual temperature. In general there was a very strong downward trend for the high temperature predictions - almost all stations had better than 95% accuracy at predicting tomorrow's weather, and they were all about 70% accurate at the weather a week from now. There was less of a trend noted for the low predictions, however those are typically less useful apart from determining the likelihood of frost.

The score for POP is based off the p-value for each category of prediction. In essence, I checked the number of days that a given station predicted a POP of 10%, and compared it to the fraction of days that it actually did rain for that prediction. This doesn't translate directly into an accuracy percentage, which is why I call them 'scores' instead (though if every category had precisely the incidence of rain as predicted, it would end up with a score of 100).

So there you go! Hopefully this helps you the next time you're planning a picnic (or whatever people check the weather for...).


Josh Classen said...

Why leave out my forecast for CTV, or Stephanie Barsby's forecast for CBC, or Michelle McDougall's forecast for City-TV? HUMAN-produced forecasts are almost always more accurate that computer-based forecasts and beyond 24hours, the EC & WxNetwork forecasts are model-based. I bet you'd get a different outcome if you included all the forecasts.

Josh Schmaltz said...

Mike has used POP percentage estimates as part of the evaluation metric for forecast accuracy. CTV/CBC/City forecasts typically don't provide those beyond 24 hours in the future, so he couldn't include them as part of the study. Yes, POP estimates beyond 24 hours in the future often don't really have any useful meaning weather-wise, but tracking the trend of how the estimates improve/worsen as the forecast lead-time shortens will show how well the weather model and/or forecaster is making judgement calls.

AlysaS said...

First off... I am happy you attempted to do a verification. This is a question on everyone's mind, especially in the summer.

The forecasts for highs, lows and precipitation will all differ depending on what computer model is used ... of course meteorologists will always tweak them if they thing they are off by more than say 2 degrees.

The bigger issue is with the POP. These comments are generally just things to consider, and maybe use in the future if you decide to do another 'forecast verification' project.

Josh C's comment about 'human-produced' forecasts being almost always more accurate than computer based forecasts is a valid point, however, ALL forecasts are almost completely model based beyond 48 hours. That includes CTV, CBC, TWN, EC, City-TV and any other weather forecast organization. Each organization might have different algorithms to incorporate climate data into the forecast output, but really, any meteorologist can't really add too much value beyond 48 hours. They generally only have the ability to do hand-wavy forecasts beyond 48 hours (tell you vaguely that it's supposed to be windy/maybe some rain/sunny/snowy etc.). An example of this is the American convective outlooks from the storm prediction center. They describe in meteorological detail the possibilities for the next 8 days, but their descriptions drop in detail substantially beyond day 2.

The bigger problem though is with the process of VERIFYING whether a POP forecast was successful or not. EC won't include a POP in a forecast unless it is greater than 30. So using 10 as a forecast value to indicate a positive precipitation forecast doesn't really qualify as an accurate positive precipitation forecast. Here is a link to the EC requirements to use POPs: Go to Chance of Precipitation, it is the same as Probability of Precipitation.

Also, what is used to verify that the precipitation actually occurred? Human observations (like at the international airport CYEG) are MUCH better than auto stations; they are TRAINED observers. They will indicate if it rains just to the west, north, east, south or anywhere in between...Auto stations ONLY report if it rains right on the sensor. Also, some sensors (and Edmonton downtown is known to do this a lot) give fake observations of precipitation. It will report 0.5 mm of precipitation even on days that it is sky clear, hot and sunny...therefore these observations are much more unreliable.

One last thing, what time period are you referring to when you say 'it rained' or 'it didn't rain'? For EC, the forecasts of 'today', 'tonight', 'tomorrow' are specific time frames. Today is from 6am until 6pm, tonight is from 6pm to 6am and tomorrow is from 6am to 6pm. For forecasts beyond 48 hours the forecasts only describe the time between 6am and 6pm on that day (regardless if there is precipitation expected overnight between days).

Those are just my thoughts. It is cool to see someone else looking into the observations vs. the forecasts.

May I also make a shout out that it is ALWAYS important to listen to watches and warnings ... regardless of what you heard in any forecast earlier in the day. Environment Canada issues the watches and warnings and is the best source for frequently updated weather information.

You think you'll ever do a more elaborate verification? A winter one with snowfall amounts would be cool :)

Anonymous said...

Ralph Wright
I run Alberta's largest near real time weather network, currently numbering 162 stations. Go to and check out raw station data from over 350 hourly reporting stations across Alberta