Voting intention polls in the UK are accurate after all!
For years, I’ve observed that UK polls on average underestimate the Conservative vote and overestimate Labour’s vote. When I converted poll data into forecasts of seats won, I had to first estimate how much polling error there would be. So what’s changed? It turns out I was comparing polls to the wrong statistic, namely national vote share. The correct comparator is in fact average vote share per seat.
This article was edited on 23rd June 2024 with the addition of an extra section at the end.
Data used in this article
Historical data for each voting intention poll’s estimate of each party’s vote share can be found from Mark Pack’s invaluable Pollbase. For each election between 1950 and 2019, I take the average of all polls that took place in the week before every general election to derive the Polls Vote Share for each party. A poll’s date is based on its fieldwork date, not the publication date which can be a few days later. For elections in the 1950’s, Gallup were the only pollster so instead of using the week before, I used the month before.
For National Vote Shares (= each party’s share of all votes cast in Great Britain) for each election, I use the House of Commons Research Library. As well as providing data at national level since 1918, the library also provides data on number of votes cast for each party in each constituency since 1918. Note I use the figures for Great Britain rather than the United Kingdom since nearly all polls do not survey in Northern Ireland.
Here is a Microsoft Excel spreadsheet with all the data I’ve used in this article GB General Election Data 1918-2019 – votes
Why I thought polls were inaccurate
After the 2019 general election, I updated my article “UK General Elections – How Accurate are Voting Intention Polls?“. There I stated the key statistic for forecasting seats won was the Conservative lead over Labour using the national vote shares for Conservatives and Labour. I defined the Poll Error for each election as follows –
- VoteCLd = Actual CON lead over LAB = CON national vote share – LAB national vote share
- PollCLd = Average CON lead over LAB in polls = CON polls vote share – LAB polls vote share
- Poll Error = VoteCLd – PollCLd
The key chart from my updated article in 2019 was this one showing the Poll Errors for the 20 general elections between 1950 & 2019.
Over this time frame, the average polling error was +1.5% i.e. the actual Conservative lead over Labour was on average 1.5 percentage points larger than predicted by the polls. The standard deviation of the polling error was 4.2% and the autocorrelation was +0.13. The black line is a centred 5-election moving average of the poll error.
I noted 5 out of the last 8 elections had a polling error of 4% or more, 4 favouring the Conservatives, 1 favouring Labour. In fact, it was almost 6 of out 8 since the polling error in 2005 was +3.9%. That is why I stated in my first look at the 2024 general election that there was a 65% chance of another significant polling error.
What is Average Vote Share per Seat?
In the 2019 general election, the national vote share in Great Britain was 44.7% for the Conservatives and 32.9% for Labour. The Conservative lead over Labour was +11.8% (= 44.7% – 32.9%) which was +2.0% (the poll error) higher than the +9.8% lead predicted by the polls.
Across the 632 seats in Great Britain, the Conservative vote share varied between 7.8% in Liverpool Riverside and 76.7% in Castle Point and the Labour vote share varied between 3.7% in East Fife and 84.7% in Liverpool Walton. On average, across the seats where each party stood, the Conservative vote share was 44.0% and the Labour vote share was 33.8%. These two figures are the Average Vote Share per Seat for the two parties.
Why does the average vote share per seat differ from the national vote share? The answer is turnout or more specifically, the turnout differential between Labour and Conservative seats. In 2019, the average turnout in the 365 seats won by the Conservatives was 69.0% and in the 202 seats won by Labour, it was only 63.9%, a difference of 5.1%. So at a national level, the lower turnout in Labour seats, which by definition must have high vote share for Labour to win them, translates into fewer number of votes for Labour than those seen for the Conservatives since the higher turnout in their seats means the same vote share results in more votes at a national level.
The table to the right demonstrates this using 2 seats A & B where the average vote share is the same for Conservative & Labour but the national vote share favours the Conservatives due to a favourable turnout differential.
Polls predict Average Vote Share per Seat
If I calculate the Conservative lead over Labour using average vote share per seat, then in 2019, the Conservatives were +10.2% (= 44.0% – 33.8%) ahead of Labour. Notice how much closer this is to the lead of +9.8% as predicted by the polls, an error of only +0.4%. Does that mean if I compare the Conservative lead over Labour as predicted by polls with the lead as calculated from average vote share per seat, the polling errors will be smaller on average?
The answer is a clear yes from the chart below.
On average, the polling error for the 20 elections from 1950 to 2019 is 0.0%, the standard deviation is 3.5% and the autocorrelation is -0.11. There are 11 errors favouring the Conservatives and 9 favouring Labour. It also looks like the distribution of the errors follows a normal distribution with 14 elections having errors in magnitude less than the standard deviation which is close to the 13.7 elections that would be expected under normality.
I should say the differences between the polling errors measured with national vote share and average vote share are not statistically significant (p-values >10%). This means we can’t completely discount the possibility that polls have been unlucky vs national vote share but personally, I’ve seen enough to conclude that UK voting intention polls are predictors of average vote share per seat, not national vote share as I previously thought.
Why do polls predict average vote share better?
I’m sure those who produce polls for a living may have more insight than me but my immediate answer is that polls actively try to screen out those who do not intend to vote. Polls then seek to weight respondents according to past voting behaviour and relevant demographics but one thing they don’t do is weight by the past turnout of the seat the respondent lives in. If they did weight by turnout, then I suspect they would be closer to national vote share.
Will this lead to more accurate forecasts?
By now, you might be saying “I agree polls predict average vote share better than national vote share but you still have to come up with some way to convert average vote share per seat into national vote shares so as to make forecasts of number of seats won.” My next election blog will show this is not the case, one can use an estimate of average vote share (as given by the polls) directly to estimate number of seats won. Historically, this has led to more accurate forecasts of number of seats won.
One way to realise why this is the case is to consider how First Past The Post works as an election system. The first point to note is that votes in one seat have zero effect on what happens in another seat. Given this, it does not matter whether turnout in seat A is twice that of seat B, all that matters is who comes first in each seat. That fundamental observation should lead you to realise average vote share per seat is the better metric to estimate.
A great demonstration of this effect occurred in the 2005 general election. Labour won a clear majority with 355 seats out of 645 total. In the 529 English seats, Labour also had a clear majority winning 286 seats to the 194 won by the Conservatives. What is not well known is the Conservatives won more votes than Labour in England with a national vote share of 35.7% to Labour’s 35.5%. Whilst First Past the Post is not intended to deliver proportional results, this disparity between votes and seats is extreme.
The explanation is given by the average vote share per seat. Labour’s was 37.3% whilst the Conservatives was 34.2% i.e. on average in each seat, Labour had a higher vote share than the Conservatives hence why they won more seats. The reason for this contradiction was an extreme turnout differential between Labour and Conservative seats in England. In Labour seats, the average turnout was 57.4% but in Conservative ((& Liberal Democrat seats), average turnout was 65.3%, almost 8 points higher than Labour.
Polls don’t predict sum of CON & LAB vote share?
This section was added on 24th June 2024.
The Conservative lead over Labour can also be described as the difference between CON and LAB vote shares (CON–LAB). My 2024 general election forecasting model used the difference in CON and LAB vote shares but also uses the sum of CON and LAB vote shares. How accurate have polls been when predicting CON+LAB using average vote share per seat?
Over the entire time period, the answer is spot on since the average error is 0.1%. However, this disguises some significant shifts over time. Since the 1960s (when Gallup finally had competitors) through to the 2000s, polls overestimated the sum CON+LAB by 1.2%. In the 2010s, polls underestimated the sum CON+LAB by 3.1%. The step change is statistically significant with a T-statistic of 4.8 and p-value of 0%.
Why has this happened? The most likely explanation has the rise of web polling at the expense of phone polling. In the 2010 election, the number of pollsters jumped from 7 to 12 and remained at 11 or 12 for the next three elections. So far, for the 2024 election, I am currently tracking 17 pollsters!
The ease of web polling explains why we have more polls but has that been at the expense of quality? For the Conservative lead over Labour, the answer appears to be no but for the sum of Conservative and Labour, the answer appears to be yes. Web polls appear to make it easier for people to say they will vote for a third party but when it comes to the actual vote, they don’t always stick to their intention. A possible reason for this might be that not all parties stand in all seats but I have not fully investigated that.
For my 2024 election forecast, I intend to use three scenarios for the polling error in the sum CON+LAB.
- An overestimate of 2% (as happened between 1964 & 2005)
- No error i.e. 0% (which is the case on average over all elections)
- An underestimate of 4% (as happened in the 2010s)
I find it hard to ignore the statistically significant step change in 2010, especially since we have even more pollsters in 2024, so I intend use weights 1:1:3 respectively for these three scenarios.
— Would you like to comment on this article? —-
Please do leave your comments on this X/Twitter thread.
— Subscribe to my newsletter to receive more articles like this one! —-
If you would like to receive notifications from me of news, articles and offers relating to Elections & Polling, please click here to go to my Newsletter Subscription page and tick the Elections and/or Surveys category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.
— Read some of my other blog posts on elections —
- What would the results of the 2015 election be if the d’Hondt voting system had been used?
- Do pollsters show forecasting skill?
- Who is the most accurate pollster?
- The final set of polls for GE2019.
- My GE2019 forecast is the most accurate of all!
- Keir Starmer’s train to Downing St
- My initial thoughts on the 2024 general election
Click here for a complete list of all my posts on elections.