Ahead of the 2017 general election, I predicted that the opinion polls would be wrong again and that the Conservatives lead over Labour would be underestimated by 2.6%. I based this on data provided by Mark Pack who has systematically recorded every opinion poll published since 1945. In the event, I was right that the polls would be wrong but instead of an error favouring the Conservatives, the polls recorded the largest ever underestimate of the Labour vote. As a result, election forecasters were blindsided yet again and the result was a hung parliament which few saw coming.
This post analyses only the polls that took place in the week before every general election between 1950 & 2017. Note this is based on the fieldwork dates, not the publication date which can be a few days later. For the elections in the 1950’s, Gallup were the only pollster so instead of using the week before, I used the month before.
For each election, I calculated the average vote share recorded across all pollsters for the Conservatives, Labour and Liberal Democrats (Liberals 1950 to 1979, Alliance 1983 & 1987). The polling error for each party can then be calculated as the actual election result for Great Britain minus the average vote share from the polls. I use the figures for Great Britain rather than the United Kingdom since nearly all polls do not survey in Northern Ireland.
The results can be seen in the chart below. The black line is the number of polling companies that polled in the run up to the election.
The 2017 general election saw the highest ever underestimate of the Labour vote. At 4.5% this exceeded the 4.3% underestimate in the 1951 general election when Gallup was effectively the only pollster. The Conservative and Liberal Democrat vote shares were spot on. Unlike 2015 when all pollsters got it wrong, some pollsters did get Labour right in 2017 notably Survation.
Looking back, I am particularly struck by what has happened in the last 3 general elections, specifically –
- An overestimate of 4.1% for the Liberal Democrats in 2010.
- An underestimate of 4.1% for the Conservatives in 2015.
- An underestimate of 4.5% for Labour in 2017.
- An increase in the number of pollsters to 11+ compared to 5 to 7 between 1970 & 2005.
The 3 significant errors listed are in fact 3 of the 5 largest errors ever seen since 1950 with the Labour underestimate of 4.3% in 1951 and the Conservative underestimate of 5.2% in 1992 being the other two. One could make the argument that the lower cost of internet polling has encouraged more pollsters to join the party but at the cost of quality. There might be something in that argument but that would be to ignore the poor record of the pollsters in other elections such as the period 1992 to 2001. 1992 saw the largest ever error for a party which was also accompanied by a 4% overestimate for Labour. This polling miss is well known but what is not so well known is that the pollsters continued to get the Labour vote wrong by a similar order of magnitude in 1997 & 2001. Those errors passed unnoticed because the pollsters did get the outcome right i.e. a Labour landslide, but I would contend that these have to be recorded as significant polling errors.
One of the things that strikes me about this chart is how the errors for Labour and the Liberal Democrats are inversely correlated with each other with a correlation coefficient of -0.61. This makes sense in today’s environment where there is a lot of talk of a progressive alliance between Labour, Lib Dems and the Greens and one easily imagine a scenario where tactical voting means the polls overestimate Labour and underestimate the Lib Dems. However it would seem that this has been a dynamic in British elections for a very long period of time. The equivalent correlation between the Conservatives and Lib Dems is only -0.13.
Given this, I have redone the chart by combining the Conservatives & UKIP into one group and Labour, Lib Dems & Greens into another group. In practice, the UKIP & Green errors are only known for 2015 & 2017 as those have been the only elections where pollsters have recorded votes for these parties separately rather than putting lumping them into Others. So for the most part the chart below is comparing the Conservative poll error with the combined Lab/LD error.
The lines represent centred 5-election moving averages and currently sit at a 1.3% underestimate for the CON/UKIP error and a 1.4% overestimate for the LAB/LD/GRN error.
This revised chart makes things a lot clearer and also emphasises the exceptional nature of 2017. In 2010, the Lib Dems were badly overestimated but this was partly compensated by an underestimate in the Labour vote. In 2017, the underestimate in the Labour vote was only slightly compensated by an error in the Lib Dem & Green vote. The CON+UKIP vote share was overestimated by 2.4% but this was almost entirely due to the UKIP vote being overestimated since the Conservative vote share was more or less spot on.
In effect, 2017 was a repeat of the 1983 & 1951 elections but on a larger scale. History shows that errors like these favouring Labour are exceptional and the norm has been errors (usually significant errors) that favour the Conservatives. The chart above shows only 6 elections out of 19 with errors favouring the “progressive alliance” with 4 of these taking place before 1966. Conversely, the 7 elections prior to 2017 all had errors favouring the Conservatives. In my earlier post on polling errors, I stated that a repeat of 1983 was a distinct possibility for a number of reasons and I put the probability of an error (of any magnitude) favouring Labour at 25% so my warnings were prescient. At the same time, I stated that I was likely to go with a final forecast that assumed an error that favoured the Conservatives whilst considering a scenario where the errors favoured Labour. This is what I did in the end but I went about this in a different way as I explained in my post “5 steps to making sense of the latest polls.” The difference between my original intentions and my actual method was only 0.5% in the Conservative lead over Labour so I am satisfied that I fulfilled the spirit of what I intended.
So far I have been concentrating on the expected vote share for each party or combination of parties. In practice, when it comes to predicting the outcome of an election that uses First Past the Post as its election system, the more important prediction is the Conservative lead over Labour. These parties have always been expected to take the top two places nationally so I have calculated the expected lead from the polls and compared it with the actual lead to produce the following chart.
We see again how 2017 reflects 1983 though the error is larger than 1983 but not as large as 1951. The 5-election centred moving average still shows an underestimate of 2% in the Conservative lead over Labour and indeed this appears to have been the long run average since 1964. It is very tempting (and no doubt many people will try) to put reasons on this chart but treated as a time series in its own right, I have to say that I do not see any explanatory patterns apparent.
If we define a SIGNIFICANT error as being one where the CON-LAB lead is out by 2%, then we can make the following observations about the 19 elections since 1950:-
- Only 4 out of 19 elections did not experience a significant polling error (1955, 1964, 1979, 2010)
- 10 out of 19 elections experienced a significant polling error favouring the Conservatives.
- 5 out of 19 elections experienced a significant polling error favouring Labour.
- The average polling error (in CON-LAB lead) is +1.5% and the standard deviation is 4.3%.
- If our null hypothesis is that the average polling error is 0%, then our t-statistic is +1.5 and the p-value (using 2-tailed t-test) is 15%.
- Nothing greatly changes if we confine our analysis to 1974 onwards i.e. from when the Northern Ireland parties and the Nationalists arrived on the political scene and the CON+LAB vote share saw a significant shift downwards.
I chose 2% as a definition of a significant error as my experience shows that an error on this scale will mislead election forecasters as happened for the most part in 2017. If another election is called, I would now seek to calculate 3 scenarios:
- A 4% underestimate in the CON-LAB lead as stated by the polls which favours the Conservatives.
- No error in the CON-LAB lead as stated by the polls.
- A 4% overestimate in the CON-LAB lead as stated by the polls which favours Labour.
The question which then has to be answered is what is the relative likelihood of each scenario? Whilst the statistical significance of the historical errors is not overwhelming, I would still be happy to use a 2:1:1 ratio i.e. I would give scenario 1 50% weight, scenario 2 25% weight and scenario 3 25% weight.