Meteorologists define spring in the UK to be the period from March to May so spring is now over and we are officially in summer. I have decided to create a series of posts which I will publish at the end of each season showing how that season compares to the weather record. The 2017 spring turned out to be the 2nd warmest on record.
Analysis of trends is a key skill that all statisticians and analysts need to have. Indeed I run a training course on Identifying Trends & Making Forecasts as I have learned over my 25 years as a professional statistician that skills in this area are nowhere near where they should be. By publishing this series of posts about trends in weather and other fields, I hope you will learn something you will be able to apply elsewhere.
As my weather tracker series shows, weather cannot be thought of as a univariate data set. We define weather as a combination of temperature, sunshine and rainfall which makes weather a multivariate data set. There are many ways of displaying multivariate data but one of the first questions that has to be asked is how do we compare 3 variables that use 3 different numerical scales i.e. degrees Celsius, hours and millimetres? The answer is to convert the scales of all 3 variables into a new common scale using a procedure known as STANDARDISATION.
Standardisation works as follows. For each year within each variable, we first subtract the average for that variable to get the difference from the average. Then we divide by the standard deviation of that variable. For example, if the average spring temperature over the last 50 years is 7.6 degrees Celsius and the standard deviation is 0.8 degrees Celsius, then the STANDARDISED spring temperature for 2017 will be +1.9 since the actual average 24 hour UK spring temperature was 9.1 degrees Celsius i.e. (9.1 – 7.6)/0.8 = +1.9.
Notice that by dividing by the standard deviation, we convert the scale of any variable to a scale which measures the variable as Number of Standard Deviations Above/Below the Mean value. So spring 2017 was almost 2 standard deviations above the average. If you are familiar with the properties of the Normal Distribution, you will know that if your variable follows a normal distribution, then 95% of the data points will lie within +/- 2 standard deviations of the mean and 5% of the data points will lie without +/- 2 standard deviations of the mean. So our +1.9 for the STANDARDISED spring temperature for 2017 does suggest that 2017 was unusual (but not exceptional) and indeed spring 2017 was the 2nd warmest on record beaten only by 2011 as shown in the chart below.
Standardised variables (also known as Z-SCORES) aid interpretation of data in many ways. If the standardised value is positive, it means that the value is above your average or expected value. If it is negative, then the value is below your expected value. The chart above suggests that our spring temperatures have been following a rough 25 year cycle e.g.
- From 1910 to 1935 (roughly), UK springs were colder than average.
- 1936 to 1961, UK springs were average.
- 1962 to 1987, UK springs were colder than average.
- 1998 to 2017, UK springs were warmer than average.
If the original variable is approximately normal in its distribution then the vertical scale gives us an idea of how typical or atypical each year is. Z-Scores in the range -1 to +1 are considered typical values and completely unremarkable. Z-scores in the ranges -2 to -1 and +1 to +2 are considered to be uncommon values but still entirely plausible and such values should not cause us concern. When Z-Scores get into the ranges -3 to -2 and +2 to +3, we should start paying closer attention and asking ourselves if something has changed especially if we get a sequence of successive points in these ranges. Finally, if the Z-scores are less than -3 or greater than +3, that is normally regarded as a clear call to action. There are in fact many ways of interpreting Z-Scores and what I have said so far merely a gives an overview of the most basic interpretations. A whole field of study known as Statistical Process Control (SPC) is dedicated to building and interpreting such charts (known as a CONTROL CHART).
One point to clarify when calculating the z-scores for the weather variables is over what timeframe should the average and standard deviation be based on. I have decided to go with a rolling 50-year average and standard deviation so since we are in 2017, these values will be calculated on the 1967 to 2016 timeframe. My reason for using this timeframe is that it seems like a good timeframe for the concept of “living memory” i.e. we evaluate the most recent weather in terms of our experience & memory and being a child of the 70’s I can still remember some of the weather of the late 70’s.
I said earlier that if we have multivariate data sets then standardising variables allows us to compare multiple variables of differing scales. The next two charts show the z-scores for spring sunshine and rainfall in the UK.
The z-score for spring sunshine in 2017 was +1.3 so this is well above average but completely within normal range of expectations. The z-score for rainfall was -0.8 so spring 2017 was drier than normal but completely unremarkable in terms of rainfall. When we combine these two values with +1.9 for temperature, I think it is fair to conclude that spring 2017 was definitely a pleasant one!
What I have covered here in this post is simply some of the basics of analysing time series data. When I analyse the summer of 2017 at the beginning of September, I will add more features to these charts which will allow us to gain a greater insight into our weather.