The French pollsters are congratulating themselves for getting the first round of the French Presidential Election right last weekend. Recently Nate Silver criticised the accuracy of polling in the UK. For British pollsters, 2015 was a year they would like to forget and I am sure many of them will be nervous of the forthcoming general election but how nervous should they be?
I was one of the few people to predict the polls would be wrong in 2015 though the magnitude of the error was larger than I expected. Two years ago I based my conclusions on an analysis of polls between 1992 & 2015 but for the 2017 election I have analysed a longer time period from 1950 to 2015. I have made use of the excellent work done by Mark Pack who has systematically recorded every opinion poll published since 1945. Based on this, I am expecting the polls to be in error again with the Conservative lead over Labour underestimated by 2.6%.
I arrived at this conclusion by identifying from Mark Pack’s Pollbase every poll that took place in the week leading up every general election between 1950 & 2015. Note this is based on the fieldwork dates, not the publication date which can be a few days later. For the elections in the 1950’s, Gallup were the only pollster so instead of using the week before, I used the month before.
For each election, I then calculated the average vote share recorded across all pollsters for the Conservatives, Labour and Liberal Democrats (Liberals 1950 to 1979, Alliance 1983 & 1987). The polling error for each party can then be calculated as the actual election result for Great Britain minus the average vote share from the polls. I use the figures for Great Britain rather than the United Kingdom since nearly all polls do not survey in Northern Ireland.
The results can be seen in the chart below. The black line is the number of polling companies that polled in the run up to the election.
The error in 2015 is clearly shown but what is worse is that it was not the worse election for the polls. That dubious honour is held by 1992 with the Conservative underestimated by 5% and Labour overestimated by 4%. What is not well known is that the pollsters continued to get the Labour vote wrong by a similar order of magnitude in 1997 & 2001. Those errors passed unnoticed because the pollsters did get the outcome right i.e. a Labour landslide, but I would contend that the pollsters should have been just as concerned about those years as they were about 1992 & 2015.
What is clear from the chart is that polls have been systematically underestimating the Conservatives and overestimating Labour for quite some time. When Gallup was the only pollster in the 1950s, the situation was the other way around but since new pollsters joined the party in 1959, the error for the Conservatives has varied between a 2.2% overestimate (in 1983) and a 5.2% underestimate (in 1992) with a median underestimate of 1.1%. For Labour, the error ranges from a 4% overestimate (in 1970 & 1997) to a 2.3% underestimate (in 1974 & 2010) with a median overestimate of 2.3%.
One of the things that strikes me about this chart is how the errors for Labour and the Liberal Democrats are inversely correlated with each other with a correlation coefficient of -0.67. This makes sense in today’s environment where there is a lot of talk of a progressive alliance between Labour, Lib Dems and the Greens and one easily imagine a scenario where tactical voting means the polls overestimate Labour and underestimate the Lib Dems. However it would seem that this has been a dynamic in British elections for a very long period of time. The equivalent correlation between the Conservatives and Lib Dems is only -0.13.
A similar thing may happen in this election with the Conservatives and UKIP so I have redone the chart by combining the Conservatives & UKIP into one group and Labour, Lib Dems & Greens into another group. In practice, the UKIP & Green errors are only known for 2015 as that has been the only election where pollsters have recorded votes for these parties separately rather than putting lumping them into Others. So for the most part the chart below is comparing the Conservative poll error with the combined Lab/LD error.
The lines represent centred 5-election moving averages and currently sit at a 1.9% underestimate for the CON/UKIP error and a 2.4% overestimate for the LAB/LD/GRN error. The difference between these two numbers represent a 0.5% underestimate for other parties which always happens as polls can never pick up the votes for the myriad of tiny parties and independents.
This revised chart makes things a lot clearer. For the last 7 elections, the polls have always underestimated the Conservatives (&UKIP) and overestimated the combined Labour/Lib Dem (& Green) vote. It was this kind of information that led me to predict a poll error in 2015. Since 2015, there has been a lot of work by polling companies to try and avoid a repetition but as 1997 shows, the last time they tried to rectify a major error, they didn’t actually succeed so why should it be different this time around?
The one point that might bring this streak to an end is that the fact that the 2017 general election bears a lot of similarities with the 1983 general election. 1983 is the first election I can remember in detail and the similarities are quite striking namely:
- The previous saw a major event that realigned British politics, the Falklands War in 1982 and the EU referendum in 2016.
- The incumbent government was Conservative and the opposition Labour.
- Labour was led by a leader widely seen as weak and ineffective, Michael Foot in 1983, Jeremy Corbyn in 2017.
- The Conservatives were led by a female Prime Minister perceived as strong, Margaret Thatcher in 1983, Theresa May in 2017.
- Labour’s policies were seen as very left wing in both elections whilst the Conservatives were seen as distinctly right wing.
- There was a 3rd party insurgency. In 1983 this was the Liberal/SDP alliance. In 2017, arguably the insurgency incurred in the previous election with UKIP & SNP but the combined votes of UKIP, Greens & SNP plus the Lib Dems who are acting as a insurgent party stills adds up to quite a bit.
- A Conservative landslide was widely expected with the only question being how large their majority would be.
As the chart above shows, 1983 was the last time a significant overestimate in the Conservative vote occurred with their actual vote 2.3% below what the polls were showing. Does that mean with all the similarities between 1983 & 2017 that we will see a similar overestimate. After all, the two major polling errors of 2015 & 1992 both occurred in elections where the parties were thought to be neck & neck with a hung parliament the most likely outcome. Can we make the case that the narrative of the polls has an effect on voter behaviour?
Some people do indeed make such a connection but as a professional statistician, I can’t subscribe to an interpretation that reeks of “correlation equals causation”. For a start I can point out 1987, when another Conservative landslide was expected, the polls got it bang on. Also there is no justification for overemphasising one data point and completely ignoring all the other elections. Saying that, I do intend to produce forecasts assuming a poll error similar to 1983 but such a forecast will merely be one scenario rather than a true forecast. My final forecast is likely to be a weighted average of a number of scenarios including an 1983 one.
In the end, I decided to calculate a weighted average of all errors since 1950 using the number of pollsters in my first chart as my weight. I think one can give less weight to the early years given that there were fewer pollsters and the trend over the last 7 years is quite striking and by using the number of pollsters as my weight, I am using an independently derived weighting scheme rather than a cherry picked one. This gives the following figures.
- The expected polling error for the Conservatives + UKIP is an underestimate of 1.3%
- The expected polling error for Labour + Lib Dems + Greens is an overestimate of 1.8%
- The expected polling error for the SNP & Plaid Cymru is zero.
- The expected polling error for other minor parties is an underestimate of 0.5%
- The probability of the Conservatives + UKIP being OVERESTIMATED is 25%
- The probability of Labour + Lib Dems + Greens being OVERESTIMATED is 85%
The two probabilities stated are again weighted averages. The Conservatives one appears to be quite high given that no such overestimate has taken place for 7 years but statistically this is a valid calculation using the concept of likelihoods. These probabilities are almost certainly going to be the weights I will use when generating and combining scenarios but I will decide for sure nearer the time.
So far I have been concentrating on the expected vote share for each party. In practice, when it comes to predicting the outcome of the election that uses First Past the Post as its election system, the more important prediction is the Conservative lead over Labour. These parties have always been expected to take the top two places nationally so I have calculated the expected lead from the polls and compared it with the actual lead to produce the following chart.
In 1983, the Conservative lead over Labour was overestimated by 3.4% but otherwise the weighted average of the expected error in the estimate of the Conservative lead is 2.6% and the probability of this being overestimated still 25%. Interestingly, the last time the Conservative lead was overestimated was in 2010. This occurred because although the Conservative vote was underestimated as expected and the combined Lab/LD vote was overestimated as expected, the Lib Dems enjoyed a major bounce during the election campaign which was not sustained in the ballot box so their votes were significantly overestimated whilst Labour’s vote was underestimated. 2010 was the first time I made an election prediction (CON 277, LAB 225, LD 114, OTH 34) which turned out to be wide of the mark! Whilst I got the Conservative lead over Labour right, 2o10 taught me not to trust the polls and this led to my forecast that there would be an error in 2015. For 2017, the similarity with 1983 is weighing on my mind at present which is why my general election forecast currently assumes no polling error but nearer the time, I will make a decision on how to incorporate expected polling errors.