As I write this, a plethora of economic forecasts are making the rounds in the news in the UK. In all cases, the forecasters have failed to publish their track record and these days, I will not pay attention to what they say unless their forecasts are accompanied by a track record. But, how does one go about presenting a forecasting track record to prove that one has forecasting skill? To demonstrate, I will analyse how well opinion polls have predicted General Elections in the UK and measure their track record. I must confess I was surprised at what I found out and I would urge all opinion pollsters to take note of my results.
I first talked about the need for transparent, publicly available, track records for forecasters early last year in my article “how do you identify a good forecaster“. When I published my 2017 UK General Election forecast I thought I should practice what I preach and I produced a simple election forecasting track record for myself at the start of that youtube clip. However, it was not a track record that was traceable i.e. you could not go back and verify what I had forecast but from 2017, all of my forecasts (via my blog) have been traceable. Should there be another election in the near future, I will ensure my track record is easily available on my blog and contains links to specific forecasts for traceability purposes.
However, publishing my track record tells you nothing about whether I actually have forecasting skill. To do that, I have to show that my forecasts are more accurate than what I call a dumb or naive forecast. As I explained in my forecasting article last year, the three simplest dumb forecasts are:-
- Last Time – The results of the next election will be the same as the last election.
- On Average – The results of the next election will be equal to the long term average.
- ARMA – The results of the next election will be equal to the average of the Last Time and the On Average forecasts.
ARMA stands for Auto Regressive Moving Average, a staple of basic time series analysis. In this post, I am going to compare the predictions of the UK Opinion Pollsters (POLLS) with the ARMA dumb forecast for UK General Elections from 1951 to 2017.
First I have to decide what is the specific forecast to be compared? I have chosen to use the Conservative lead over Labour in vote share for Great Britain as I have stated on previous occasions that this is the key statistic when predicting the number of seats each party will win in an election using the First Past The Post system. Nearly all opinion polls only cover Great Britain and exclude Northern Ireland so the CON-LAB lead statistic has to be for GB only.
POLLS Vs ARMA
The table here shows the actual result along with forecasts for POLLS and ARMA. A positive lead is coloured blue to show that the Conservatives got more votes than Labour, a negative lead is coloured red to show Labour got more votes than the Conservatives. The two forecasts shown are calculated as:-
- POLLS is the average of all opinion polls that took place in the week before the election according to Mark Pack’s database.
- ARMA is the average of the two columns on the right which are the CON-LAB lead for the previous election (Last GE) and the running average of all previous elections (Long Term Avg) e.g. for 1970 is average of 1945 to 1966, 2017 is average of 1945 to 2015.
- Forecasts start from 1951 since that is the first election where a genuine Long Term Avg could be calculated using 1945 & 1950.
Because ARMA is a dumb forecast, I can actually predict the next UK general election now even though it might not take place until 2022! The Last GE number will be +2.5% which is how GE17 ended up and the Long Term Avg number is 0.7% which is the average of 1945 to 2017. The average of these two numbers is +1.6%.
For each forecast, I have used 3 symbols to denote its accuracy as follows :-
- Green tick where the forecast was within 2% of the actual result.
- Yellow exclamation mark where the forecast was within 4% but greater than 2% of the actual result.
- Red cross where the forecast was more than 4% away from the actual result.
Finally I have undertaken a head to head competition and I flag who was the more accurate forecaster. For 17 of the 18 elections, there is a winner but I am ruling 1970 as a draw (or no-result) since both POLLS and ARMA had Labour ahead of the Conservatives but in the end, the Conservatives won the vote and both POLLS and ARMA have red crosses for large errors.
The error bands of 2% & 4% are somewhat arbitrary but I have in mind of how much an effect an error would have on the number of seats won by both parties. For example, 76 out of 632 seats in GB in 2017 had majorities of less than 4%. Assuming uniform national swing, if the CON-LAB lead of +2.5% had been -1.5% instead, I estimate the Conservatives would have won 31 fewer seats and Labour would have won 35 more seats making Labour the largest party. Conversely, if the CON-LAB lead of +2.5% had been +6.5% instead, the Conservatives would have won 23 more seats and Labour would have lost 22 more seats giving Theresa May a working majority of 32 seats. So I think these error bands make sense since an error on this scale has the capacity to change the outcome of the election.
Who wins? POLLS or ARMA?
So do these figures show forecasting skill for POLLS? What struck me immediately was the head to head battle where the score is a narrow 9-8 win (with 1 draw) for POLLS. Given that ARMA is a dumb forecast which can be made anytime immediately after an election and POLLS in theory are supposed to be capturing what we are thinking today, such a narrow win is somewhat surprising. This is especially surprising given that ARMA cannot really forecast landslides given the nature of how it is calculated. ARMA largest forecast is +7.8% in 1987 (Thatcher’s landslide) whilst POLLS largest forecast was -17.8% in 1997 (Blair’s landslide).
In theory, ARMA should do best when elections are close and poorest in landslides since POLLS will always have a margin of error (typically claimed to be +/-3%). If elections are close, then POLLS are at risk of a normal margin of error. Looking back though, the results don’t seem to support that. For the 9 elections from ’51 to ’79, the largest lead was -7.3% in ’66 and POLLS won that era 5-3 (or 6-3 if ’70 is regarded as a win). For the 9 elections from ’83 to ’17, the largest lead was +15.2% in ’83 and ARMA won that era 5-4.
Of course, what should be clear is that when ARMA loses, it can lose very big indeed such as ’97. So we have two potential ways to evaluate forecasting skill, either measure how often the forecast is a winner or measure the extent of its defeats. The next table show statistics for both ARMA and POLLS. The first statistic shown is the Mean Absolute Error (MAE) which is the average of the forecast errors ignoring the sign. The dumb forecast ARMA has an MAE of 5.3% so for true forecasting skill, POLLS needs to be below that which is the case with an MAE of 3.9%. However, I have shown that the number of seats that each party wins can change notably with a 4% error leading to election outcomes that are surprises. I would argue that POLLS need to be doing better than this.
The breakdown by error size is not much comfort to POLLS. Based on number of elections where the error was less than 4%, ARMA in fact wins 10-9. Again it is clear that when ARMA is wrong it is massively wrong whereas POLLS are not as wrong. The table shows the largest error by POLLS was half that of ARMA.
From a risk management point of view, minimising error is the correct yardstick to evaluate forecasts. So on that basis, POLLS does beat ARMA based on MAE and Maximum Error. Interestingly though, POLLS is not the best option if minimising error is your goal! The best option is to take the average of POLLS and ARMA which is summarised by the COMBO column.
COMBO is not the best option if your goal is to maximise the number of wins. By its nature, COMBO is in between POLLS and ARMA and if both point to the same party coming top then one of those two will be the winner and not COMBO. COMBO will only have a chance of winning in elections where POLLS and ARMA disagree as to which party will top the poll. There have been 5 such elections (’64, ’79. ’92, ’97 and ’10) and COMBO wins 4 of those (all but ’97). In the other 13 elections, POLLS win 8-4 (+ 1 draw). So this suggests a clear strategy for trying to pick winners, use POLLS unless ARMA disagrees in which case use COMBO.
A final point to note is that all three forecasting methods have similar average errors i.e. on average they underestimate the CON-LAB lead by 1.7% or so. I have written about this before when it comes to the polls but it is surprising to see the same thing for ARMA. At first, this suggests that a correction should be applied to all ARMA forecasts but when I do this, the MAE falls slightly but the other statistics don’t change so I would not recommend that approach without more thought.