The last 3 general elections have seen some significant polling errors. In 2010, the Lib Dems were significantly overestimated, in 2015 the Conservatives were underestimated and last year saw the largest ever underestimate in the Labour vote. Whilst these errors suggest that the polling industry is struggling with general elections these days, a natural question to ask is “are all pollsters equally bad or are some better than others?”
My answer to this question is to look at all polls undertaken in the week before the 2010, 2015 and 2017 UK general elections using polling data provided by Mark Pack who has systematically recorded every opinion poll published since 1945. A total of 50 polls were used with fieldwork dates in the following date ranges:
- 2010 – 13 polls between 1st & 5th May.
- 2015 – 18 polls between 1st & 7th May
- 2017 – 19 polls between 1st & 7th June
For 2010, I used the last 5 days worth of polls because there was a noticeable shift in the last few days with the Lib Dems coming off their highs in the 2010 campaign due to “Cleggmania”. There were no noticeable shifts in the last week of the 2015 & 2017 campaigns.
Obviously I need to devise a way to measure the accuracy of the polls and a common metric is the root mean square error. This is calculated by taking the difference between the estimated and actual vote shares i.e. the ERROR, SQUARE it, calculate the MEAN squared error across the parties taking part and then take the square ROOT of the mean squared error. For example, the last poll in the 2010 campaign was undertaken by IpsosMORI and they had errors of -1%, -1% and +3% for the Conservatives, Labour and Liberal Democrats respectively i.e. they underestimated the Conservative and Labour vote and overestimated the Lib Dem vote. The sum of the squared errors would be 0.01^2 + 0.01^2 + 0.03^2 and the root mean squared error is 0.021 or 2.1%.
This might be the obvious way to do it but there is a flaw which can be illustrated using 2015 as an example where no pollster was close. The final vote share in Great Britain (pollsters do not survey Northern Ireland) was 38% for the Conservatives and 31% for Labour, a lead of 7 points. Suppose we have two pollsters A & B as follows:
- Pollster A – CON 35 LAB 34
- Pollster B – CON 35 LAB 28
Both pollsters have a 3 point underestimate in the CON vote and 3 point errors in the LAB vote but pollster A has a 3 point overestimate whilst pollster B has a 3 point underestimate. Using RMSE, both pollsters would have the same value of 3% but there is no question that they are giving different narratives to the public. Pollster A says the parties are neck and neck which would imply a hung parliament whilst pollster B says the Conservatives have a 7 point lead which could result in a small majority which is exactly what happened in 2015. Under the First Past the Post voting system it is the lead between the two main parties that determines the number of seats and so there is no question that pollster B is more accurate. So the correct way to look at the poll estimates is to look at these values instead
- Pollster A – CON 35, CON-LAB lead +1
- Pollster B – CON 35, CON-LAB lead +7
which give RMSE of 2.1% for pollster B and 4.7% for pollster A.
With this logic in mind, I have chosen to calculate the RMSE for each pollster by using the following values:
- Error in the Conservative vote share
- Error in the Conservative lead over Labour
- Error in the Labour lead over the Lib Dems
- Error in the Labour lead over UKIP (for 2015 only)
So RMSE was based on 3 values in 2010 & 2017 and 4 values for 2015. I feel these values determine the narratives of the campaign and the best pollster will be the one with the lowest RMSE. Using these, can we now say who is the best pollster?
First, I decided to list the top and bottom 3 individual polls based on RMSE as shown in the table. Some pollsters carried out more than one poll in the last week of the campaign so it is possible for them to appear more than once. Indeed in 2010, YouGov had two of the worst polls and the best poll whilst Comres had two of the best polls and one of the worst polls. Note that I have only included in this table pollsters who have undertaken polls in at least two elections.
What does the table show us? The most striking fact for me is that Survation had the most accurate poll in 2017 and the most inaccurate poll in 2015. I think this point is not widely understood. Just because a pollster happens to be the most accurate in one election, does not mean they will be the best next time around. Indeed Comres who came closest in 2015 had one of the worst polls in 2017.
Not all errors are equal though. You can see that Survations’ RMSE in 2015 was +5.5% whereas YouGov had the worst poll in 2010 but only with an RMSE of 1.9%. Pollsters often tell us that they aim for errors of less than 3 points so I have chosen to use that as my criteria. Consequently any poll with an RMSE over 3% is coloured red in the WORST column. I should have done the same with the BEST column since the 2nd and 3rd best polls in 2015 actually had RMSEs over 3% which just goes to show how bad the 2015 performance was.
These are of course individual polls and if we want to identify the best pollster, we really should look at all the polls they published in the last week of the campaigns. The next table shows the number of polls undertaken by each pollster in the last week of each election and the average RMSE across those polls. The last column is a straight average of the RMSE for each election i.e. I have not taken into account how many polls were undertaken in each campaign which I feel is fairer as a pollster who did 4 polls in 2015 and 1 in 2010 would be treated more favourably than a pollster who did 4 in 2010 and 1 in 2015.
The pollsters have been split into two groups; those who polled in all 3 elections and those who polled in only 2 elections such as Survation. Those who only did 1 election are not shown here but they are included in the overall average in the bottom row. I should point out that some pollsters may have changed their names or been bought out so it is possible that two apparently different pollsters are in fact the same company.
You can see that I have used a colour coding system for the RMSE. Interestingly, it appears that the established companies who did all 3 elections are better on average than the newcomers. Comres come out on top and none of their RMSE exceed my criteria of 3% though one might notice an apparent trend for the worse and indeed one of their polls was one of the worst in 2017. ICM struggled with 2015 in particular and whilst 2017 was better, the RMSE was still large.
From the newcomers, Survation took the plaudits in 2017 but they were a particularly poor performer in 2015. This raises the question, did they genuinely learn from their mistakes and make the correct improvements or were they simply lucky? We will find out in the next general election!