In many countries across the world, the total effect of the Coronavirus pandemic is now being measured using the concept of Excess Deaths. However, publication of such data by the Office of National Statistics for England is up to 2 weeks slower than the daily deaths published by Public Health England. In this post, I update my model which uses the PHE series to estimate what the ONS will publish for excess deaths in England on Tuesday 19th May.
I intend to update this post every week and you can follow me on Twitter to be told when I have made updates. Previous posts are listed below.
The reader is advised to read these previous estimates so as to familiarise his or herself with the methods and terminology used throughout this post.
Time Series used in this post
I’ve used the following 4 time series, each denoted by a 4 letter code. Clicking on this will take you to the source data.
- PHEr – Public Health England COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate and confirmed with a positive test in an NHS/PHE laboratory. Published everyday, this is the most common headline figure. The link given here contains a further link to a spreadsheet with the relevant data.
- ONSr – ONS COVID19 Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday but the daily data can be found on the COVID19-ENGLAND tab of the downloaded spreadsheet.
- ONSx – ONS Excess Death Registrations – Daily number of deaths by date of registration with COVID19 on the death certificate from all locations. This is published weekly on a Tuesday and can be extracted from the WEEKLY DATA tab of the downloaded spreadsheet. I use the day of week pattern of the ONSr series to convert the ONSx weekly data into ONSx daily data.
- CQCn – Care Quality Commission COVID19 Notifications – All care home are required to notify the CQC of any death in their home within a short period. Since the outbreak, care homes are now able to say if they suspect the death was COVID19 related without a test. The data is passed onto the ONS who published the data weekly.
I have only extracted data for England from these sources but some also cover Scotland, Wales & Northern Ireland. For more information about these and other COVID19 relates time series, please click here.
My Weekly Estimates & Extrapolations for Excess Deaths in England
My estimates of excess deaths for the weeks ending 8th & 15th May are shown below along with extrapolations (not estimates) for ONSr which I explain in a separate post (see sections 1 & 4).
There were 7,735 excess deaths in England in the week ending 1st May. This was 1,100 lower than I had estimated last week which differs from two weeks in a row where I made underestimates of 2,400. Whilst last week was an improvement, I still have opportunities to better use of the new time series CQCn to add an extra dimension to my model.
I intend this weekly series of posts about estimating excess deaths to be a real time case study about the difference between Technical & Fundamental forecasting, a concept that I talk about in more depth in my 1-day training course “Identify trends & make forecasts“. These are the two avenues open to a forecaster when trying to forecast a quantity Q over a timeline T.
- Predict Q(t+i) using the history of Q up to time period t only. This involves identifying the underlying pattern of Q over time and then extrapolating that pattern into the future. This is sometimes known as Technical forecasting in financial markets.
- Predict Q(t+i) based on its relationship with an input variables X(t+j) (i not necessarily equal to j). This requires statistical modelling to quantify the relationship between Q & X. X can then used to predict Q in the future. This is sometimes known as Fundamental forecasting in financial markets.
There is never a right or wrong answer to this question. The advantage of extrapolation is that it only requires the history of Q itself and no other information. The disadvantage is that no insight is gained as to why Q is changing and you have to assume that the historical pattern observed will repeat itself in the future. Modelling on other hand will give you insight and can spot if the pattern of Q is going to change in the future. The difficulty is that you may need to forecast X in the future before you can use X in the future which has the effect of shifting uncertainty in Q to uncertainty in X rather than giving you greater accuracy.
Modelling ONSx as a function of PHEr
In the case of excess deaths, our output time series Q(t) is ONSx(t) and our input time series is PHEr(t). Because the PHEr is published at least two weeks in advance of ONSx, we do not have a problem with not knowing what PHEr is going to be in the future since we already have the data as shown in the table above. Therefore modelling would appear to be the better option but how good is it?
Since both ONSx and PHEr are based on death registrations one would expect there to be some relationship in terms of timing. The big difference between the two time series is that PHEr only counts deaths with a positive test for COVID19 undertaken in a PHE/NHS laboratory whereas ONSx counts all deaths over and above a baseline. If the additional data recorded by ONSx is following a different timeline, then it will make it more difficult to use PHEr as a predictor for ONSx.
The two charts shown further down plot the ratio of ONSx to PHEr for each day of the week. For example, on 24th April 2020, there were 1684 excess deaths recorded by ONS and 761 COVID19 deaths recorded by PHE which is a ratio of 2.27. I have plotted this ratio by the ONS defined week (Saturday to Friday) with separate lines for each of the six ONS week numbers 13 to 18. There is a clear pattern with a low ratio at the weekend (and bank holidays such as Good Friday & Easter Monday) and a much higher ratio during the week.
If I can extract the right pattern, I can create a model fit for each day of the week. With this model fit, I can then multiply the known PHEr figure for each day by the fitted ratio to arrive at my estimate of excess deaths for that day. For example, PHEr counted 603 death registrations on Weds 22nd April so if I multiply this by my then fitted ratio of 2.34, I get an estimated excess deaths for 22nd April of 1410. The trick though is identifying the right model fit and for the two weeks up to 24th April, I was underestimating by 2,400 deaths but for the latest week, I overestimated by 1,000 deaths.
Last week, I chose to take a weighted average of the ratios for each day with the weights equal to the excess deaths for just the 2 weeks ending 17th & 24th April (listed in the table above) as my model. That gave a model fit as shown in the chart on the left nelow. The reason why I ignored earlier weeks was that both CQCn data and ONSr data showed that deaths in care homes were following a different timeline to deaths in hospitals and homes and that those deaths were only just peaking. In previous weeks, I had been taking weighted averages using all weeks and had I done so, I would have been spot on. This can be seen in the chart on the right where the model fit is close to the diamonds representing week ending 1st May.
It should be obvious from the chart that there is a considerable margin of error in the fitted ratio of at least +/- 30% or more so in one sense the errors seen over the last 3 weeks is not surprising. However, two patterns are apparent. The first is that when a day is in error, it has a tendency to be equally in error for all days of the week. So the week ending 24th April was higher than my model fit for 6 out of the 7 days. That indicates that the errors for each day are not independent of each other. The second pattern is that the average ratio for the 2 weeks ending 17th & 24th April (squares) were notably higher than the previous 2 weeks (triangles) and the latest week (diamonds) is in between. I determined last week that deaths in care homes was the factor most likely to be responsible for this.
COVID19 deaths in Care Homes are following a different timeline
Since 10th April, CQC have published daily number of COVID19-related deaths in care homes. The latest data is up to 8th May and is shown here.
In the 4 weeks shown here, total CQCn deaths have been 2500, 2750, 2500 & 1700 for the weeks ending 17th April, 24th April, 1st May & 8th May respectively. Having data for the week to 8th May gives us another lead indicator for estimating excess deaths in this week. What CQCn tells us is that the last week is well down on the previous weeks, two of which saw the highest ratios of ONSx to PHEr which is one reason why my new model fit of this ratio now takes a weighted average of all weeks rather than the highest weeks.
Unfortunately, CQCn data does not go back earlier that 10th April so we don’t know what the longer term trend is. However, the ONSr series can be broken down by the type of location where death occurred. There are 4 main categories which are hospitals, at home, care homes and others (hospices, communal establishments and outdoors). This data goes back to ONS week 11 which is week ending 13th March and gives both total deaths from all causes and deaths where COVID19 is on the death certificate (ONSr). From the ONSx chart shown earlier, we know excess deaths became noticeable in week 13 onwards so by using weeks 11 & 12 as a baseline first and then recalibrating so as to get the known number of excess deaths in each week, I have been able to make a reasonable approximation of how all excess deaths break down by the 4 locations which is shown in the chart here.
This clearly shows that the timeline for deaths in care homes is different to hospitals. Hospital deaths peaked in week ending 17th April and was only slightly higher than the previous week. Care home deaths though increased considerably in that week and by even more in the following week ending 24th April and outnumbered hospital deaths. For the week of 1st May, Care homes deaths fell, broadly following the pattern seen in CQCn above, so one can conclude that the peak was in week ending 24th April. Given that CQCn saw a larger fall in week ending 8th May, I am much more confident in using a model fit for the ratio of ONSx to PHEr as a weighted average across all weeks is the right one.
One last point to make is the difference between excess deaths shown in this chart and the actual number of COVID19 registrations in these locations. Not all excess deaths are associated with having COVID19 on the death certificate. For deaths in the Home, only 20% of excess deaths have a COVID19 death certificate. The ratio rises to 40% for care homes, 100% for others and 135% for hospitals. The difference between care home and hospitals is extremely striking and suggests a very big difference in how doctors are filling out death certificates. Further understanding of this difference would be very helpful
Comparing Estimated ONSx with Extrapolated ONSx
Two weeks ago, I pointed out the value of comparing my modelled (or fundamental) estimate above with an extrapolated (or technical) estimate (see section 6 of this link) as a sense check. Like last week, my modelled estimate of 5,923 excess deaths is very similar to my extrapolated estimate of 5,698 excess deaths in England for week ending 8th May. For the following week to 15th May, my modelled estimate is 4,963 is starting to diverge from my extrapolated estimate of 3,708. I may need to refine my extrapolated model at some point.
– More posts about COVID19 –
- A very useful guidance to interpreting statistics of COVID19 published by the Royal Statistical Society.
- My collection of links to all kinds of material related to the statistics of COVID19, epidemiological modelling and testing.
- How large a sample is needed in order to decide whether COVID19 restrictions can be lifted? A lot, lot less than you think!
- Latest trends and data for COVID19 deaths in England