In many countries across the world, the total effect of the Coronavirus pandemic is now being measured using the concept of Excess Deaths. However, publication of such data by the Office of National Statistics for England is up to 2 weeks slower than the daily deaths published by Public Health England. In this post, I update my model which uses the PHE series to estimate what the ONS will publish for excess deaths in England on Tuesday 23rd June.
I intend to update this post every week and you can follow me on Twitter to be told when I have made updates. Previous posts are listed below.
- Estimates to 20th April
- Estimates to 1st May
- Estimates to 8th May
- Estimates to 15th May
- Estimates to 22nd May – this is a tweet instead of a blog post since I did not have time to write a post that week.
- Estimates to 29th May – this estimate was the first time I used the method described in this post.
- Estimates to 5th June
- Estimates to 12th June
The reader is advised to read these previous estimates so as to familiarise his or herself with the methods and terminology used throughout this post.
Time Series used in this post
I’ve used the following 4 time series, each denoted by a 4 letter code. Clicking on this will take you to the source data.
- PHEr – Public Health England COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate and confirmed with a positive test in an NHS/PHE laboratory. Published everyday, this is the most common headline figure. The link given here contains a further link to a spreadsheet with the relevant data.
- ONSr – ONS COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday but the daily data can be found on the COVID19-ENGLAND tab of the downloaded spreadsheet.
- ONSx – ONS Excess Death Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday and can be extracted from the WEEKLY DATA tab of the downloaded spreadsheet. I use the day of week pattern of the ONSr series to convert the ONSx weekly data into ONSx daily data.
- CQCn – Care Quality Commission COVID19 Notifications – All care home are required to notify the CQC of any death in their home within a short period. Since the outbreak, care homes are now able to say if they suspect the death was COVID19 related without a test. The data is passed onto the ONS who published the data weekly.
I have only extracted data for England from these sources but some also cover Scotland, Wales & Northern Ireland. For more information about these and other COVID19 relates time series, please click here.
My Weekly Estimates & Extrapolations for Excess Deaths in England
My estimates of excess deaths for the weeks ending 12th & 19th June are shown below along with extrapolations (not estimates) for ONSr which I explain in a separate post (see sections 1 & 4). Two estimates for ONSx are given, EST1 is based on my model described in links 1 to 4 above, EST2 is based on my new model described in links 6 to 8 above and also in this post.
There were 663 excess deaths in England in the week ending 5th June, 650 lower than my estimate. I have said before I would prefer to be overestimating than underestimating and the error is within the 95% confidence interval but I would like to be more accurate going forward.
Why write a series of posts on estimasting excess deaths?
I intend this weekly series of posts about estimating excess deaths to be a real time case study about the difference between Technical & Fundamental forecasting, a concept that I talk about in more depth in my 1-day training course “Identify trends & make forecasts“. These are the two avenues open to a forecaster when trying to forecast a quantity Q over a timeline T.
- Predict Q(t+i) using the history of Q up to time period t only. This involves identifying the underlying pattern of Q over time and then extrapolating that pattern into the future. This is sometimes known as Technical forecasting in financial markets.
- Predict Q(t+i) based on its relationship with an input variables X(t+j) (i not necessarily equal to j). This requires statistical modelling to quantify the relationship between Q & X. X can then used to predict Q in the future. This is sometimes known as Fundamental forecasting in financial markets.
There is never a right or wrong answer to this question. The advantage of extrapolation is that it only requires the history of Q itself and no other information. The disadvantage is that no insight is gained as to why Q is changing and you have to assume that the historical pattern observed will repeat itself in the future. Modelling on other hand will give you insight and can spot if the pattern of Q is going to change in the future. The difficulty is that you may need to forecast X in the future before you can use X in the future which has the effect of shifting uncertainty in Q to uncertainty in X rather than giving you greater accuracy.
Modelling ONSx as a function of PHEr
In the case of excess deaths, our output time series Q(t) is ONSx(t) and our input time series is PHEr(t). Because the PHEr is published at least two weeks in advance of ONSx, we do not have a problem with not knowing what PHEr is going to be in the future since we already have the data as shown in the table above. Therefore modelling would appear to be the better option but how good is it?
Since both ONSx and PHEr are based on death registrations one would expect there to be some relationship in terms of timing. The big difference between the two time series is that PHEr only counts deaths with a positive test for COVID19 undertaken in a PHE/NHS laboratory whereas ONSx counts all deaths over and above a baseline.
Until 4 weeks ago, my model was based on the ratio of ONSx to PHEr by day. I plotted these ratios by week as shown in the chart and then attempted to identify the appropriate average for each day of the week based initially on best guesses but then supplemented by published CQCn data (plotted below). The reason I initially did it this way was sample size. By taking daily data I could increase the sample size. If I were to use this model again, which is the ratio shown by the black line and happens to be the average of the last 3 weeks, this gives EST1 in the table at the beginning which shows 1086 deaths for week ending 12th June and 868 deaths for week ending 19th June. I consider both of these to be overestimates hence this is no longer my formal model but I am including them for completeness.
By now, we have 11 weeks of data with significant excess deaths plus of couple weeks beforehand when the first COVID19 deaths were recorded. That is enough to start building a model with weekly data only. One change I made straightaway was to change the output variable. Currently it is
ONSx = ONSa – ONSb
where ONSa is total number of deaths from all causes and ONSb is the baseline number of deaths defined to be average of 2015 to 2019. My new output variable is
ONSm = ONSa / ONSb
I call ONSm the Mortality Ratio. The advantage of this is it makes is easier to predicted negative excess deaths which occurs when ONSm is less than 1. It also allows for log transformations of the output variable which couldn’t be done with ONSx but can be done with ONSm and is equal to log(ONSa) minus log(ONSb).
I have plotted ONSm against both PHEr and CQCn on the same scatter plot here since PHEr and CQCn are similar in scale. ONS week numbers are used as labels and the most recent week ending 5th June is week 23. We are trying to predict ONSm for week 24 where we already know what PHEr (1113) & CQCn (318) are from the table at the beginning. The labels with white backgrounds (weeks 15 & 19) had Friday bank holidays (Good Friday & VE Day respectively). The reason I highlight this is because PHEr and ONSm are based on death registrations and bank holidays result in reduced staffing levels for compiling the data and thus artificially lower death counts. In contrast, I believe the effect of Monday bank holidays is more limited since staff have the rest of the week to catch up.
One new effect is now apparent following the week 23 data. It would appear the slope of the relationship when excess deaths were increasing between weeks 11 & 15 is shallower than the slope for weeks 16 to 23 when excess deaths were falling. I have taken this effect into account to arrive at an estimated mortality ratio of 0.97 for week ending 12th June and 0.89 for week ending 19th June with 95% confidence intervals of +/- 0.09. This converts into estimates for ONSx of -308 for week ending 12th June and -952 for week ending 19th June with 95% confidence intervals of +/-882. These are the numbers appearing in the EST2 column in the table shown at the start of this post and if correct would mark the end of the first wave of the COVID19 pandemic.
**IMPORTANT – PHE made a change in the way they record deaths in the week ending 29th May (week 22) as described in this link. For the purposes of using the model shown in the chart here, I included an extra dummy variable for weeks 22 & 23 in my model hence why this week is highlighted in purple in the chart. My forecast for weeks 24 & 25 take this effect into account
CQCn data is only available from week 16 (week ending 17th April) and so cannot be incorporated directly into the model above. If I build a separate model for the blue labels on the scatter plot, I get an estimate for ONSm of 1.04 which converts to an estimate for ONSx of 391. Clearly a CQCn based forecast is very different from a PHEr based forecast. However the sample sizes are very different and I am not yet ready to publish a forecast based on CQCn. For now, I will stick with the PHEr based forecast.
Comparing Estimated ONSx with Extrapolated ONSx
A few weeks ago, I pointed out the value of comparing my modelled (or fundamental) estimate above with an extrapolated (or technical) estimate (see section 6 of this link) as a sense check. My extrapolated estimate for week ending 5th June was 1059 deaths which was closer than my EST2 & EST1 estimates. For week ending 12th June, my extrapolated estimate is 760 deaths which is higher than both my CQCn (391) and PHEr (-308) estimates. I have however pointed before than it is not possible for my extrapolated ONSx extrapolation model to predict negative excess deaths which is a known flaw. I now think we are at the level where the extrapolation model will no longer work.
– More posts about COVID19 –
- A very useful guidance to interpreting statistics of COVID19 published by the Royal Statistical Society.
- My collection of links to all kinds of material related to the statistics of COVID19, epidemiological modelling and testing.
- How large a sample is needed in order to decide whether COVID19 restrictions can be lifted? A lot, lot less than you think!
- Latest trends and data for COVID19 deaths in England