Meteoologists define spring in the UK to be the period from March to May so spring is now over and we are officially in summer. The 2018 spring was very sunny but this hides the fact that our spring climate has changed quite notably over the last 25 years.

I analyse the long term trends in the UK weather using a statistical tool known as **Standardisation**. This means that the 3 key variables of Temperature, Sunshine and Rainfall are recalculated so that they all have the same units, which is number of standard deviations from the mean. Such variables are known as **Z-Scores** which by definition will have a mean value of 0 and a standard deviation of 1. For more information on how I have done this, please read my post on trends in the UK summer of 2017.

**Latest Z-Scores**

The Z-Scores for Temperature, Sunshine and Rainfall are shown in the 3 charts below. Each chart also contains an 11-year centred moving average which gives an idea of the underlying trend.

Standardised variables aid interpretation of data in many ways. If the standardised value is positive, it means that the value is above your average or expected value. If it is negative, then the value is below your expected value. If the original variable is approximately normal in its distribution then the vertical scale gives us an idea of how typical or atypical each year is. Z-Scores in the range -1 to +1 are considered typical values and completely unremarkable. Z-scores in the ranges -2 to -1 and +1 to +2 are considered to be uncommon values but still entirely plausible and such values should not cause us concern. When Z-Scores get into the ranges -3 to -2 and +2 to +3, we should start paying closer attention and asking ourselves if something has changed especially if we get a sequence of successive points in these ranges. Finally, if the Z-scores are less than -3 or greater than +3, that is normally regarded as a clear call to action. There are in fact many ways of interpreting Z-Scores and what I have said so far merely a gives an overview of the most basic interpretations. A whole field of study known as Statistical Process Control (SPC) is dedicated to building and interpreting such charts (known as Control Charts).

For the spring of 2018, the z-scores for temperature, sunshine and rainfall were respectively +0.6, +0.6 and +0.4. So across all three dimensions, the spring was half a standard deviation above average i.e. pretty decent. If you have been following my monthly weather tracker then you will know that this was not the case in all 3 months. March was cold whilst May was warm so this is a good illustration that a seasonal average can mask considerable variation from month to month.

**Long Term Climate Trends**

Since the 3 moving averages in the above 3 charts all use the same units, they can be plotted onto the same chart as below.

This clearly shows a shift in our spring climate over the last 100 years of roughly 1 standard deviation. Recall that the baseline for the z-score calculation is based on the idea of “living memory” which I have defined to be the last 50 years of 1968 to 2017. We can characterise our springs broadly as follows:

- 1915-1940 – we had cool, dry and dark(?) springs.
- 1940-1960 – we had sunny, dry and normal temperature springs.
- 1960-1990 – we had cool, dark and normal rainfall springs.
- 1990-today – a clear shift in our climate occurred to warm, sunny and normal rainfall springs.

So clearly 2018 is largely consistent with the recent climate period.

**How many dimensions does Spring have?**

The long term trends chart above suggests that the z-scores for temperature, sunshine and rainfall all appear to be correlated. In fact this can be illusory as the above chart uses moving averages. If we look at the actual z-scores, we can see what the correlation is in the 3 scatter plots below.

The brown square in each chart is 2018. Scatter plots can be useful to identify unusual years that do not follow the normal relationships. Here we see that 2018 was completely consistent with historical scatters.

Looking at the 3 scatter plots in turn, we see that temperature is positively correlated with sunshine, i.e. sunny days tend to be warm, temperature is not correlated with rainfall i.e. wet days can be either warm or cool, whilst rainfall is negatively correlated with sunshine i.e. dark days tend to be wet. A statistician would look at these charts and say that what at first appears to be 3-dimensional data (temperature, sunshine and rainfall being the 3 dimensions) is in fact closer to be being 2-dimensional. Both of the new dimensions would be a weighted average of these 3 z-scores e.g. dimension 1 might be temperature plus sunshine and dimension 2 might be rainfall minus sunshine. There are in fact many possible weighted averages of the z-scores that could be used so we need a method that identifies the best weighted average of the z-scores that reduces the dimensionality of spring weather from 3 dimensions to 2 dimensions. Furthermore, we would like there to be no correlation between the two dimensions i.e. if we did a scatter plot of the two dimensions, no relationship would be apparent.

When I analysed Winter 2018, I introduced the idea of components (the word we will use instead of dimension) and I stated that there is a method known as Principal Components Analysis (PCA). that can do these calculations. If this is the first time you have seen these words, then I strongly recommend you read my Winter 2018 post to familiarise yourself with the concept of a component. You also need to familiarise yourself with the concept of the variance which is the square of the standard deviations of the components. So if I carry out a PCA using the 3 spring z-scores, what do I find?

The first output of PCA is the scree plot shown here. We are analysing a 3 dimensional data set here and the scree plot shows what proportion of the 3 dimensions is accounted for by each component. Here, the first component PC1 accounts for just over 1.5 dimensions or more strictly 53% of the total variance across the 3 dimensions. PC2 accounts for 30% so together, the first two principal components account for 5/6 of the total variance (or dimensions) of the data. Before, had we chosen any two of the three z-scores, we would only account for 2/3 of the total variance so PCA has resulted in 2 new components that almost account for all variation seen normally in 3 dimensions.

What do these components look like? The best way to answer that question is to plot the Correlation Bi-Plot. Two versions are shown, an original known as an Unrotated BiPlot and an alternative version known as Rotated BiPlot (using Varimax rotation for those who want to know such details!)

What does a bi-plot show? It shows the correlation of the original variables (3 z-scores in this instance) with the two new principal components PC1 & PC2. Each axis label shows (in brackets) how much the components account for of the total variance and since the total of these values is less than 100%, this warns us that the bi-plot is still an approximation of the entire dataset and therefore some information is missing. Despite that, it is still very informative.

Starting with the unrotated bi-plot, you can see that PC1 is highly positively correlated with Sunshine since the Sunshine label is almost at +1 on the PC1 axis. Temperature is also positively correlated with PC1 but not as strongly as Sunshine. Finally, Rainfall is negatively correlated with PC1 but at about the same degree as Temperature is positively correlated. Therefore when PC1 is positive, this implies that Spring has been sunnier, hotter and drier than normal and when PC1 is negative, this implies that Spring has been cooler, duller and wetter than normal. In effect, PC1 measures how nice our Spring is. Looking at PC2, we find that it is positively correlated with both Rainfall and Temperature but not correlated with Sunshine. So when PC2 is positive, our Spring is both warmer and wetter than normal and when PC2 is negative, our Spring is cooler and drier than normal. In effect, PC2 measures how wet our Spring is given the temperature.

The labels are joined by lines to the origin of the plot and doing this allows us to visualise the correlation between the 3 variables. You can see that Rainfall and Temperature lines have an angle of about 90 degrees between whereas Temperature and Sunshine has an angle of about 40 degrees. If you know your trigonometry, then you will know that the Cosine of 0 degrees is +1, the Cosine of 90 degrees is zero and the Cosine of 180 degrees is -1. This is the same scale as the correlation coefficient and from the scatter plots of 3 z-scores earlier, we know there is very little correlation between temperature and rainfall. So the angles between the lines represent (approximately) the correlations between the respective variables which is a useful feature of the bi-plot.

However, you are not required to use the original biplot that is created when you first do PCA. The current PC1 and PC2 require some mental gymnastics to characterise exactly what they represent. Fortunately, PCA allows you to rotate the labels around the circle by an equal angle for all labels. If I rotate the labels by about 40 degrees clockwise, I end up with the rotated biplot above. Now, the rotated PC1 and PC2 appear to have a much cleaner interpretation; PC1 measures how bright and sunny Spring is and PC2 measures how wet and dull Spring is.

Let’s not forget that Principal Components are by definition uncorrelated with each other, which means you can analyse each component independently. Of particular interest is calculating the PC1 and PC2 values for each year separately and then looking at the trends over time. I have done this for the chart below.

The scale used by PC1 and PC2 is similar to the z-score concept. 0 represents the long term average, positive numbers above average and negative numbers below average. The numbers themselves broadly correspond to the number of standard deviations above or below average.

This shows much more clearly the step change in our Spring climate as the UK moved to warmer and brighter Springs from the 1990s. The trend in PC1 has plateaued though and it is clear that 2018 was completely in line with the moving average. For PC2, 2018 was completely in line with the long term average.

Is it possible to use the principal components to make predictions? I have never tried this for Spring but I have for Summer where a quite remarkable pattern emerges. You will have to wait for my Summer Trends post published in September though to find out about that but I am predicting that the summer of 2018 will not be anything special!

If you want to read my other Weather Trends posts, please click on the link or the Weather Trends hashtag below this post. Otherwise, please click the relevant season from the list below.