UK General Election 2024 – My Forecasting Model

My UK General Election 2024 forecasting model will be a top down version which I last used in 2010. Top down approaches first predict how many seats each party will win in total before seeking to identify which seats each party wins. This differs from the bottom-up approach I used in 2017 & 2019 where I forecast the outcome for each seat first and then aggregated the forecasts.

Here I explain how my 2024 forecast will be made but it finishes with a warning that I may have to dump my 2024 model in favour of the forecasting approach I used for the 2015 general election.

My Articles on the 2024 UK General Election

The most accurate forecaster of the 2019 general election – I was independently assessed as the most accurate forecaster beating even Sir John Curtice’s exit poll.
Keir Starmer’s Train to Downing Street – My assessment in 2021 of what Labour needed to do to win the next election.
My election forecasting Track Record 2010 to 2024 – A list of all election forecasts I have made for General, European and Local elections, how I made them, how they turned out and what lessons I learned.
How Accurate are Voting Intention Polls (Revised) – A recent article which explains why I now think the polls are accurate when before they would often underestimate the Conservative lead over Labour.
Going Beyond the Swing in 2024 – A preliminary look at the 2024 general election at the start of the year. I give probabilities for 10 specific outcomes.

Data used in this article

All electoral data I display and use in this article comes from the House of Commons Research Library. The PDF file I use most of the time is this one UK Election Trends 1918-2019.

All polling data prior to April 2024 comes from Mark Pack’s invaluable Pollbase. For data since April, I am using the BBC poll tracker. For a summary of what the polls are saying at the time of this article, see this X/Twitter thread.

I’ve created a spreadsheet GB General Election Data 1918-2019 – votes and seats which contains all the data used to build my models. The table below lists the key data I refer to in this article which are Seat Share, National Vote Share and Average Vote Share by party.

For clarity –

Seat Share is the percentage of seats in Great Britain won by each party.
National Vote Share is the percentage of all votes cast in Great Britain for each party.
Average Vote Share (per Seat) is the average of the vote share in each British constituency the party stands in.

I first explained the national and average vote share concepts in my article “How accurate are voting intention polls? – Revised“.

Why I did not use data before 1955

I spent a long time trying to decide if I could use data for all elections since 1918 in my model. In the end, I concluded my model should be built using data for elections between 1955 & 2019, a total of 18 elections. My reasons for excluding the earlier 10 elections were –

Uncontested seats – these were common in the interwar years and did not die out until 1950. Uncontested seats distort the national vote share.
National Coalition parties – The interwar years were marked by frequent splits in the main parties. As a result, many MPs stood and were elected on labels such as National Liberal/Conservative/Labour or Independent Liberal/Conservative/Labour. The historical records on who took what whip are not reliable which can affect the seat share estimates. By 1955 the only remaining such party was the National Liberals who took the Conservative whip until 1970 when they merged with the Conservatives.
Multi-member seats – these are still common in local elections today where one has more than one vote and each ward elects two or more councillors. Multi member seats in Parliament were abolished in 1950 but they existed before then. This can distort both seat and vote shares.

As will be seen later, most of the excluded 9 elections between 1922 (first election without Irish seats) and 1951 still fit my 1955-2019 model quite well. For the sake of greater certainty over the historical data, I am happy for my model to be confined to elections since 1955.

FPTP Outcomes = f(Sum & Difference of CON & LAB votes)

The UK uses First Past The Post (FPTP) as its general election system. To me, this name is incomplete since it doesn’t tell you what the Post is. The full name should be First Past The Post Set By Party Coming Second (FPTPSBPCS) since the party coming second sets the threshold for winning a seat. Therefore it should be no surprise that to forecast FPTP outcomes, we need to know which two parties will come first and second.

Since 1922, the top two parties across Britain have always been the Conservative (CON) and Labour (LAB) parties. With the exception of 1923, 88% or more of all seats contested in every election over last 100 years have been won by a Labour or Conservative candidate.

By following this logic, it turns out we can forecast the number of CON & LAB seats using the Sum and Difference in CON & LAB vote shares. I will use the following shorthand notation at various times in this article –

vCON = CON Vote Share, percentage of those voting who voted Conservative
vLAB = LAB Vote Share, percentage of those voting who voted Labour
sCON = CON Seat Share, percentage of seats won by the Conservatives
sLAB = LAB Seat Share, percentage of seats won by Labour

Rather than using vCON to predict sCON and vLAB to predict sLAB which is how I made my forecast in the 2010 general election, for 2024, I will use the sum and difference of these four variables –

vCLs = vCON + vLAB = sum of CON vote share and LAB vote share
vCLd = vCON – vLAB = difference between CON vote share & LAB vote share, also known as the Conservative lead over Labour.
sCLs = sCON + sLAB = sum of CON seat share and LAB seat share
sCLd = sCON – sLAB = difference between CON seat share & LAB seat share

My 2024 models uses vCLs to predict sCLs and vCLd to predict sCLd. Once I have my estimates of the sum and difference of the CON & LAB seat share, I can solve the simultaneous equations as follows to estimate the seat share for CON and LAB separately –

sCON = ( sCLs + sCLd )/2 = sum plus difference divided by two
sLAB = ( sCLs – sCLd )/2 = sum minus difference divided by two

The Relationship between Vote & Seat Share under FPTP

The four scatter plots here show the vote & seat share relationship since 1955. The plots on the left are for the sum of seat share vs sum of vote share, the plots on the right are for the difference in seat share vs difference in vote share. The plots at the top use national vote share, the plots at the bottom use average vote share.

My observations are –

For sum seat share, there was a step change in 1997 which splits the data into two eras.
Up to 1992 (the black era), 3rd parties only won between 1% and 4% of seats but from 1997 (the green era), they won 8% to 12% of seats.
The fit between sum seat share and sum vote share appears to be linear in both eras.
The fit between sum seat share and sum vote share appears to be almost identical whether national vote share or average vote share is used.
The lowest ever sum vote share was in 2010 at 66.6%.
For the difference in seat share, history shows two distinct lines and three eras.
From 1955 to 1979, we are in the black era, from 1983 to 2010, we are in the green era and from 2015 to 2019, we returned to the black era.
The fit between seat share difference and vote share difference appears to be linear in all eras.
The fit appears to be better if average vote share is used.
Seat share difference has varied between +14.4% in 1983 and -15.6% in 1997 using average vote share.

It turned out observation 9 was correct, the fit with average vote share (adjusted r^2 of 0.983) is better than with national vote share (adjusted r^2 of 0.963). This is very good news given I recently realised voting intention polls are better estimators of average vote share than national vote share. It means I can plug poll data directly into my model to estimate number of seats won without having to worry about turnout differentials between Conservative and Labour seats.

My GE2024 Model for Great Britain

I built a linear model with an additional constant to account for the step from the black to green eras as shown here –

The coefficient of 0.164 for vote seat share implies the sum of CON & LAB vote shares has to fall by 6 points for 3rd parties to gain an extra 1 point of seat share. This is the incumbency advantage of being one of the top two parties under FPTP.

I used R to build my model and the model summaries are below.

The standard errors shown are for the sum and difference of CON & LAB seat shares. In the end, what people want to know is how many seats the Conservatives and Labour (and others) are going to win. Using the simultaneous equation approach the standard error for estimates of CON seat share is 1.3% and for LAB seat share is 1.2%. With 632 seats in Great Britain, this works out as +/-8 seats which is good enough for my purposes.

As would be expected from statistical theory given my forecasting model, the errors for the party seat shares are highly negatively correlated at -0.91 i.e. if one party is overestimated, the other is very likely to be underestimated. This is shown by the largest historical errors which occurred in 2005 when the CON seat share was overestimated by 3.1% and the LAB seat share was underestimated by 2%.

Critiquing my GE2024 Model

An immediate issue with my model is whilst it can forecast how many seats the Conservatives and Labour will win, it doesn’t forecast how many seats smaller parties such as the Liberal Democrats, SNP, Plaid Cymru, Greens, Reform, etc will win. To address this, I’ve built an England only model based on the same approach as described above. The difference between the number of seats other parties are expected to win in Great Britain and in England will provide a good starting point for the number of seats to be won by the SNP and Plaid Cymru. Within England, the Liberal Democrats can be expected to win the bulk of the other seats though I will need to think of a way to account for Reform and the Greens.

That leaves two main points to address here which are connected. They are –

Is the assumption of linear model fit appropriate?
Is the model fit for purpose given the polls so for the 2024 general election?

It should be obvious that the fit cannot be linear for the sum seat share model. What if the sum vote share was 100% i.e. everyone votes CON or LAB? Then the sum seat share must be 100% (in the absence of uncontested seats) but at the moment, the green era sum seat share model would predict 94% and the black era model would predict 99.9%. Likewise, the linear fit for the seat share difference does not in theory prevent a forecast greater than 100% or less than -100%. This would only happen if the vote share difference exceeded +39% (black era) or was less than -39% (green era).

These extremes have not happened since 1955 but did they happen in the 9 elections between 1922 & 1951? The chart below show my black and green era linear fits based on 1955-2019 for both sum seat share and seat share difference along with the earlier elections added as brown markers. To the sum seat share chart, I have added two extra lines of which I explain the point later –

A dashed brown line which is parallel to the green and black lines but differs from the green era fit by the same margin as the green fit differs from the black fit.
A dashed green curved line which is almost identical to the green era fit for the most part but curves away at the ends.

There is no question the 9 earlier elections fit the seat share difference chart on the right very well. Importantly this includes the 1931 general election which was a disaster for the Labour party where they lost 235 seats to end up on 52 to the Conservatives 470. In my 1955 onwards model, the maximum vote share difference was +14.4% in 1983 whereas 1931 was +30.3%. This will be a vital observation for later on when I discuss if my model is fit for purpose for 2024.

The other observation I make on the seat share difference chart is why 1945 & 1929 are the only elections on the green era fit whereas the other 7 sit on the black era fi. I think the answer is both elections saw big swings from the Conservatives to Labour (6.1% swing in 1929, 12.1% in 1945). The same thing happened in 1997 when the fit shifted from the black era to the green era with a 10.2% swing. Note all swings here are national vote share swings. In addition –

1929 saw a major expansion of the franchise due to the age threshold being equalised at 21 for both men and women. Previously only women over 30 could vote.
1945 was the first election after world war 2 with all the ramifications that had for the electorate.
1997 saw a doubling of votes for Others (excluding CON, LAB & LD) from 3.7% to 7%, mostly led by the Referendum party.

For the sum seat share chart, 7 of the years fit well with the green and black era fits but the first two elections of 1922 and 1923 clearly do not. These two elections mark the highpoint for the Liberal party in terms of having significant number of seats so why don’t they fit well with the other elections?

The answer I think is because there were a significant number of uncontested seats. I estimate 118 uncontested in 1922 and 79 uncontested in 1923 out of a total of 603 seats for Great Britain at that time. In 1923, the Conservatives only put up 524 candidates, Labour 427 and the Liberals 457. Combined with the fact these were genuinely 3 party elections, I feel safe in treating these as genuine exceptions.

Of the 7 subsequent elections which do fit well with my sum seat share model, you may notice that 1931, 1935 & 1945 all sit above the straight green line and fit better with the curved dashed green line. As I noted earlier, I would expect this to happen as the sum vote share approaches 100%. So what is this curved green line and why am I not using it for my model?

The dashed curved green line is a Logit curve fit. The logit of sum seat share and sum vote share is calculated as follows –

Logit sum seat share = LsCLs = Log( sCLs ) – Log ( 1 – sCLs )
Logit sum vote share = LvLCs = Log( vCLs ) – Log( 1 – vLCs )

I then built a linear model of LsCLs as a function of LvCLs and the resulting transformation back to the normal scale results in the curved green line. The logit transformation is a commonly used transformation when dealing with variables which can only vary between 0% and 100% but cannot be exactly 0% or 100%. My 2010 election model I referred to earlier was actually separate logit models for each party based on the 2005 results e.g. logit( sLAB ) = a * logit( vLAB ) + constant.

So why am I not using a logit model for sum seat share in 2024? The answer is the fit looks like this when I extend the horizontal scale out to 50% for sum vote share. As can be seen, there is virtually no difference from the linear fit hence why I have stuck with a linear fit.

I’ve spent time explaining why I have decided a linear fit is acceptable for sum seat share and seat share difference. Along the way I demonstrated the seat share difference model works well for 1931 which was an Out Of Sample data point. By that I mean, the 1955-2019 model was built on a range for vote share difference from -15% to +15%. This usually means the model cannot be relied upon for when the vote share difference is outside that range or “out of sample” as it is usually referred to. At +30%, 1931 was most definitely out of sample and yet the seat share difference model held up.

That observation is important because current opinion show the vote share difference is -20% i.e. Labour lead the Conservatives by 20 points. This is again out of sample as can be seen in the charts below.

Is 2024 Out of Sample?

When it comes to my seat share difference model, I do not regard 2024 as out of sample. That is because the effect of FPTP has to be symmetric with vote share difference and if my model copes with 1931, it should cope with 2024.

When it comes to sum seat share, I am unsure. We are already 5 points below the previous minimum for sum seat share so we are in uncharted territory. We are not in uncharted territory for the LAB vote share (40%) but we are for the CON vote share (20%). Indeed there is the possibility that Reform will overtake the Conservatives to become the second largest party in vote share. At that point we will definitely be out of sample.

Such a scenario has already happened in Scotland on a few occasions –

From 1955 to 1974 Feb and 1979 to 1992, the top 2 parties were CON & LAB
For 1974 Oct and 1997 to 2015, the top 2 parties were LAB & SNP
Since 2017, the top 2 parties were SNP & CON
In 8 of the 18 elections, the sum vote share for the top 2 parties has been less than 66.6%.

Here is the equivalent sum & vote share chart for Scotland only. The colour of the labels denote the top two parties based on vote share in that election e.g. SNP & CON in 2019, SNP & LAB in 2015. The 3 lines shown are the exact same lines shown earlier for Great Britain. The point is to see if Scotland follows a similar relationship as Great Britain.

One point to bear in mind is that Scotland has only ever had between 59 & 72 seats. Therefore the sum seat share can change by between 1.5% & 2% if just one seat is gained or lost by a 3rd party. This means I would not expect as good a fit for Scotland as for Britain as a whole.

My main observation is that when the top 2 parties combine for 70% or more of the vote, Scotland effectively follows an average of the black & green era fits for Britain. When the sum share is less than 70%, this does not work and on average, the sum seat share averages around the dashed brown line. In fact, the relationship between sum vote share and sum seat share in Scotland is best represented with a logit fit though 2015 and 1974 October elections are outliers to some extent in this.

So what does this tell me about my 2024 model if sum seat share for Great Britain ends up out of sample, especially below 60%? It’s difficult to be certain but for now, I am likely to give more weight to the dashed brown line fit. The alternative is to build a completely new model in which case, I would build upon the approach I took in 2015. Let’s watch this space!

What is my forecast for the 2024 UK general election?

My official forecast can be found in my next article. This describes how I arrive at my forecast using a number of scenarios based on the latest polls.

For now, I will demonstrate a forecast using the green era fit for both the sum seat share and seat share difference models. I use the latest polls as of 19th June 2024 which show the CON + LAB vote share is 61.8% and the CON – LAB vote share is -20.4%. Using the green era equations from above we get –

CON + LAB seat share = 87.8% = 0.1640 * 61.8% + 77.67%
CON – LAB seat share = -54.5% = 2.4411 * -20.4% – 4.70%

Solving the simultaneous equations we get –

CON seat share = 16.7% = ( sum + diff )/2 = ( 87.8% + –54.5% )/2
LAB seat share = 71.2% = ( sum – diff )/2 = ( 87.8% – -54.5% )/2

With 632 seats up for grabs in Britain 2024, this equates to a forecast of 106 CON, 450 LAB and 76 OTH seats.

— Would you like to comment on this article? —-

Please do leave your comments on this X/Twitter thread.

— Subscribe to my newsletter to receive more articles like this one! —-

If you would like to receive notifications from me of news, articles and offers relating to Elections & Polling, please click here to go to my Newsletter Subscription page and tick the Elections and/or Surveys category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.

More articles on elections can be found by clicking on the Elections tab at the top of your screen.