In many countries across the world, the total effect of the Coronavirus pandemic is now being measured using the concept of Excess Deaths. However, publication of such data by the Office of National Statistics for England is up to 2 weeks slower than the daily deaths published by Public Health England. In this post, I explore how the PHE series can be used to estimate what the ONS will publish for excess deaths in England every Tuesday.
I intend to update this post every week and you can follow me on Twitter to be told when I have made updates.
The 5 time series for COVID19 related deaths in England
Each time series is denoted with a 4 letter code which I will use throughout. Clicking on the 4 letter code will take you to the source data. I have only extracted data for England from these sources but some also cover Scotland, Wales & Northern Ireland.
- PHEr – Public Health England COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate and confirmed with a positive test in an NHS/PHE laboratory. Published everyday, this is the most common headline figure.
- NHSo – NHS England COVID19 Occurrences – Daily number of deaths by date of occurrence with COVID19 on the death certificate. This is also published daily
- ONSr – ONS COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday but the daily data can be found on the COVID19-ENGLAND tab of the downloaded spreadsheet.
- ONSo – ONS COVID19 Occurrences – Daily number of deaths by date of occurrence with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday but the daily data can be found on the COVID19-ENGLAND tab of the downloaded spreadsheet. Note two columns are shown with different cutoff dates and I take the data from the column with the latest cutoff date.
- ONSx – ONS Excess Death Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday and can be extracted from the WEEKLY DATA tab of the downloaded spreadsheet. I use the day of week pattern of the ONSr series to convert the ONSx weekly data into ONSx daily data.
I am assuming that the reader understands the terminology used but for an in-depth explanation please read the first half of this post. The reader should note that since that post was published, PHE revised their time series and now include all death registrations with COVID19 on the death certificate which has been confirmed by a test in a PHE or NHS laboratory. Previously PHE/DHSC data only counted deaths in hospitals like the NHSo series.
To summarise, the first 4 time series are all for deaths where COVID19 is stated somewhere on the death certificate with PHEr requiring a positive COVID19 test in addition. The fifth time series is an estimate of all direct and indirect deaths due to COVID19 and does not need to be mentioned on the death certificate.
In this post, I will be ignoring the NHSo & ONSo series and concentrating on the relationship between ONSx and PHEr along with a passing reference to ONSr. If you want to read my comments on all 5 time series, please click here.
My Weekly Estimates & Extrapolations for Excess Deaths in England
The table shown here gives my estimates of excess deaths for the weeks ending 24th April and 1st May along with extrapolations (not estimates) for the 4 weeks after that which I explain in a separate post (see sections 1, 3 & 5).
There were 11,395 excess deaths in England in the week ending 17th April. This was 2,500 higher than I had estimated last week. Interestingly, my alternative simple extrapolation model for ONSx for that week would have been more or less spot on so this week I will be comparing my PHEr based estimates of 8,618 for w/e 24 April and 6,978 for w/e 1 May with my extrapolated ONSx estimates (not shown in the table) for the same weeks of 8,968 and 6,404 excess deaths. This makes for an interesting case study about the difference between Technical & Fundamental forecasting, a concept that I talk about in more depth in my 1-day training course “Identify trends & make forecasts”
To Model or Extrapolate, that is the question.
When trying to forecast a quantity Q over a timeline T, there are two broad avenues open to the forecaster.
- Predict Q(t+i) using the history of Q up to time period t only. This involves identifying the underlying pattern of Q over time and then extrapolating that pattern into the future. This is sometimes known as Technical forecasting in financial markets.
- Predict Q(t+i) based on its relationship with an input variables X(t+j) (i not necessarily equal to j). This requires statistical modelling to quantify the relationship between Q & X. X can then used to predict Q in the future. This is sometimes known as Fundamental forecasting in financial markets.
There is never a right or wrong answer to this question. The advantage of extrapolation is that it only requires the history of Q itself and no other information. The disadvantage is that no insight is gained as to why Q is changing and you have to assume that the historical pattern observed will repeat itself in the future. Modelling on other hand will give you insight and can spot if the pattern of Q is going to change in the future. The difficulty is that you may need to forecast X in the future before you can use X in the future which has the effect of shifting uncertainty in Q to uncertainty in X rather than giving you greater accuracy.
Modelling ONSx as a function of PHEr
In the case of excess deaths, our output time series Q(t) is ONSx(t) and our input time series is PHEr(t). Because the PHEr is published at least two weeks in advance of ONSx, we do not have a problem with not knowing what PHEr is going to be in the future since we already have the data as shown in the table above. Therefore modelling would appear to be the better option but how good is it?
Since both ONSx and PHEr are based on death registrations one would expect there to be some relationship in terms of timing. The big difference between the two time series is that PHEr only counts deaths with a positive test for COVID19 undertaken in a PHE/NHS laboratory whereas ONSx counts all deaths over and above a baseline. If the additional data recorded by ONSx is following a different timeline, then it will make it more difficult to use PHEr as a predictor for ONSx.
The chart shown here is the ratio of ONSx to PHEr for each day. For example, on 17th April 2020, there were 2263 excess deaths recorded by ONS and 1011 COVID19 deaths recorded by PHE which is a ratio of 2.24. I have plotted this ratio by the ONS defined week (Saturday to Friday) with separate lines for each of the four week numbers 13 to 16. There is a clear pattern with a low ratio at the weekend (and bank holidays such as Good Friday & Easter Monday) and a much higher ratio during the week.
Last week, I noted that the weeks ending 3rd & 10th April were very similar and I created a model fit (solid black line based on these 2 weeks. As is apparent, this didn’t happen and the week ending 17th April saw higher than expected ratios especially for Wednesday to Friday. Note, the very high ratio for Tuesday is likely a reaction to a much lower ratio for the Monday which happened to be Easter Monday, a public holiday. By Friday though the ratio had returned to the level seen in earlier weeks.
I have chosen to take a weighted average of the observed ratios for the 4 weeks to produce my fitted ratios as shown. The weights are the total excess deaths seen in each week which are listed in the table back at the beginning and repeated below. With this model fit, I can then multiply the known PHEr figure for each day by the fitted ratio to arrive at my estimate of excess deaths for that day. For example, PHEr counted 603 death registrations on Weds 22nd April so if I multiply this by my fitted ratio of 2.34, I get an estimated excess deaths for 22nd April of 1410.
It is clear from the chart that there is a considerable margin of error in the fitted ratio. So far there has been a tendency for the ratio to be too low for the whole week or too high for the whole week and thus you do not get the benefit of independent errors by day. For this reason, the margin of error on the weekly estimates given above is easily +/- 30% or more.
Going forward I intend to see if I can improve this model by using additional data sets. I am aware that data on deaths in care home is being published separately but I have not looked into that yet.
A final point. In my table of estimates, I have also estimated excess deaths for May. These estimates use the extrapolated PHEr figures shown which are then multiplied by the fitted ratio as explained here. Therefore, the margin of error will be greater for these weeks since errors might come from errors in the PHEr extrapolations. For now, take these projections (rather than estimates!) with a very large pinch of salt.
As explained in this post in section 1, my extrapolation model is a simple one designed to be automated in a spreadsheet. Yet for the week of 17th April when excess deaths were 11,395, my extrapolated ONSx prediction was 11,087 which is remarkably close. The reason I didn’t publish this estimate was because I didn’t have that much data at the time and I distrusted the model parameters. But it would have acted as a good sense check for my original modelled estimate of just under 9,000 and that is the essential value of such simple extrapolations, sense checking your modelled estimates and forcing you to question your model specification. Last week I was guilty of not doing this but in my defence, my original motive was to sense check an estimate made by the Financial Times which I thought was far too high as opposed to making explicit estimates.
For this week, the simple extrapolation is shown in the chart below as a solid black line through the 7 day CMA for the growth rate. The resulting extrapolated estimates for excess deaths for the 2 weeks to 1st May is very similar to my modelled estimates which gives me more confidence that Tuesday will see ONS publish figures closer to mine. Of course at the end of the day, we should not forget that we are forecasting deaths of actual human beings and I would much rather be wrong. But if I am going to be wrong, I do hope I am overestimating instead of underestimating as I did last week.
A technical issue to be addressed
I need to finish on a technical point that I need to address at some point. The ONSx simple extrapolation model shown extrapolates the 7 day CMA of the geometric growth rate. You can see from the curve fitted that this can never be zero or negative. For the other 4 COVID19 time series, this is a correct assumption but it is not correct for excess deaths. By definition they are the difference between actual deaths and a baseline and therefore can and have been below zero. So I will need to change my extrapolation model to a different specification in a few weeks time.
The same issue also applies to my model of the ratio between PHEr and ONSx. At the moment, it is not possible for my ratio to be negative, but of course it can be.
– More posts about COVID19 –
- A very useful guidance to interpreting statistics of COVID19 published by the Royal Statistical Society.
- My collection of links to all kinds of material related to the statistics of COVID19, epidemiological modelling and testing.
- How large a sample is needed in order to decide whether COVID19 restrictions can be lifted? A lot, lot less than you think!
- How many excess deaths have there been as of 20th April? This explains all data sources in more depth.
- Latest trends and data for COVID19 deaths in England