On 30th September 2020, CL:AIRE (the industry body for the land contamination & remediation sector) published new professional guidance for “Comparing Soil Contamination Data with a Critical Concentration“. The 46-page document advises how to use statistics when assessing land contamination and deciding whether it is safe for development. I was the lead author of the guidance and I spent 4 years working with CL:AIRE’s steering committee on what the guidance should cover. The 4 years were bookended by statement & editorial published by the ASA (American Statistical Association) on the use & misuse of P-Values in 2016 & 2019 respectively and in writing this guidance I felt was I an ambassador for turning those into something that could used by non-statisticians to make real life decisions that have an impact on us all.
My presentations & webinars about the new guidance
- Presentation to the SILC conference on 8th March 2020 with the subtitle “What’s changed in the guidance and why“
- Presentation to the SOBRA virtual conference on 2nd December 2020. The link takes you to the whole conference and I am the first speaker about 10 mins in. My talk lasts about 40 mins. My thanks to the Society of Brownfield Risk Assessment and the other presenters for allowing me to share this link.
- Presentation to the SCLF AGM on 1oth December 2020 – This is a longer presentation than what was presented to SOBRA and starts 2m30s in. My thanks to the Scottish Contaminated Land Forum for allowing me to share this Youtube link.
- “Dr Groundlove – or how I learned to stop worrying and love the Central Limit Theorem”, a Royal Statistical Society webinar on 1st March 2021 – I was one of 3 speakers as listed below.
- 4 mins in – Peter Witherington from RSK Environmental Ltd on what is contaminated land and why statistics is needed.
- 32 mins in – Myself on the statistical issues that had to be considered during the writing of the new guidance, especially on the limitations of the Central Limit Theorem.
- 69 mins in – Ron Wasserstein, executive director of the American Statistical Association and lead author of the ASA P-value statements referred to below. He compared the reality of the guidance with what he hoped to see when the ASA statements were published.
- 93 mins in – an interesting discussion of some of the statistical issues including Bayesian approaches.
- Presentation to the ELQF (East Land Quality Forum) on 22nd June 2021. This repeats some of the material of the earlier presentation but I added some slides referring to prequal report I wrote in 2017 which laid the groundwork for the guidance.
- Presentation to ENBIS (European Network of Business & Industrial Statisticians) on 1st July 2022. I describe the different statistical approach taken in the new guidance compared to the 2008 version. In particular I describe how the ASA 2016 statement on p-values influenced the 2020 version. The link takes you to a PDF of the slides. The session was recorded but I haven’t been able to get a link to it.
My approach to writing the new guidance
In the late 90s, I bought a house in Reading, a new development built on a former industrial estate. I received a survey report which summarised the tests made on the soil in my garden and how much risk it presented to potential occupants and the wider environment. My brother, who was a laboratory scientist at the time, remarked I could mine my garden for metals and suggested I shouldn’t grow fruit and vegetables in the garden. I had no interest in doing so but it was my first contact with the land contamination industry.
This industry surveys land to ensure a site is suitable for its new use and to prevent unacceptable risks from contamination. Planning officers decide if the appropriate processes and decisions on surveys and analysis have been undertaken by industry practitioners and the right decisions have been made. Practitioners working in the land contamination industry are a mixture of scientists and engineers and whilst many will have received basic training in statistics, they are not experts in statistical inference hence the need for professional guidance in statistics.
Whilst writing the new guidance, I realised the concluding paragraph from the 2016 ASA statement perfectly captured what I wanted the guidance to convey and I reproduce it here broken down as 6 bullet points –
“Good statistical practice, as an essential component of good scientific practice, emphasizes …
- … principles of good study design and conduct,
- … a variety of numerical and graphical summaries of data,
- … understanding of the phenomenon under study,
- … interpretation of results in context,
- … complete reporting and
- … proper logical and quantitative understanding of what data summaries mean.
… No single index should substitute for scientific reasoning.”
I wish I had fully realised the importance of this paragraph at the beginning of the project as I could have then recommended the guidance was laid out in this fashion. For reasons that were perfectly understandable at the time given the wishes of the steering committee, the draft guidance followed a different layout but during the revision process, I tried to steer the layout back to the ASA layout with the result the final version ended up somewhere in between. However, I did add Appendix A1 to the final version where I explicitly made the link between what was written and the ASA 2016 statement.
When taking a sample for the purpose of making decisions, the first thing a statistician wants to know is what is the population that has to be sampled and what are the criteria for making decisions. In the land contamination industry, this is delivered by something called the Conceptual Site Model (CSM) where a competent practitioner pulls together all that is already known about the site and combines that knowledge with his or her understanding of how contaminants behave in soil & groundwater and what are the potential risks to humans and the wider environment. The result is a model of the site called the CSM which is then used to break the site down into 3 parts –
- Areas that are suitable for use and safe for development.
- Areas that are not suitable for use and the risks will need to be addressed and may require remediation.
- Areas that are unclear and need to be sampled further in order for a decision to be made.
For areas of type 3, a suitable sampling & measurement plan using statistical principles will then need to be developed and a threshold for decision making, known as a Critical Concentration, needs to be specified in advance. The results of the land survey can then be analysed and interpreted using the new guidance hence its title of “Comparing Soil Contamination Data with a Critical Concentration“.
It is important to note the new guidance only covers the last step of this process, the statistical analysis and decision making, and to my mind focuses on the even numbered bullet points of the ASA statement. The odd numbered bullet points are covered by the CSM and Sample Design steps which are not explored in the guidance but are essential pre-requisites in order to use the guidance. This explains the copious number of caveats and pre-requisites at the beginning of the document as the steering committee was worried about people jumping to the analysis without having done the CSM and Sample Design work. These are large subjects in their own right and they need separate guidance to be written. It was this debate over the pre-requisites and the extent to which they should be referred to in the new guidance that explains why it took 4 years to publish it.
I would like to thank CL:AIRE for asking me to write the new guidance. It was a hugely educational process and one that forced me to examine my understanding of some basic statistical ideas (such as the Central Limit Theorem) as well as teaching me about the issues the land contamination industry has to deal with. I sincerely hope the eventual outcome for the guidance is that the sentiment I expressed at the end of Appendix A1 is the one that comes to pass.
“The guidance is written on the assumption that it will be read and used by people with a scientific training who are capable of exercising scientific judgement and who wish to use statistics to SUPPLEMENT their professional judgement, not to REPLACE their professional judgement.”
Buy my Dot & Box Plot Template!
If you are a practitioner who wishes to put the new CL:AIRE guidance into practice, my Dot & Box Plot Template spreadsheet can get you started. This is a Microsoft Excel spreadsheet which allows you to enter sample data for as many sites as you wish and to produce dot plots, box plots, summary statistics and confidence intervals for each site.
You can find more information and details on how to purchase your copy here. Please note, my ecommerce platform is used for training courses only hence why when you land on that page it will look like you are booking a training course but you are in fact purchasing a copy of the template.
If you would like to have a free 30 minute demonstration of the template before you make your purchase, please contact me to arrange this.
— Want to be notified of future posts like this one? —
If you enjoyed this article, why not subscribe to my newsletter to receive notifications of future articles, news and offers? You can choose from a variety of categories and articles similar to this one will appear under the Statistics Training category.