On 30th September 2020, CL:AIRE (the industry body for the land contamination & remediation sector) published new professional guidance for “Comparing Soil Contamination Data with a Critical Concentration“. The 46-page document advises how to use statistics when assessing land contamination and whether it is safe for development. I was the lead author of the guidance and I spent 4 years working with CL:AIRE’s steering committee on what the guidance should cover. The 4 years were bookended by two statements published by the ASA (American Statistical Association) on the use & misuse of P-Values in 2016 & 2019 and in writing this guidance I felt was I an ambassador for turning those statements into something that could used by non-statisticians to make real life decisions that have an impact on us all.
To close a pay gap you have to do three things:
- Measure where you are today.
- Specify where you want to be in the future.
- Identify the most effective way of getting there.
All 3 steps require the use of statistical thinking and statistical methods. Of course, other skills and processes are also needed but they cannot succeed on their own without the help of statistics
The core expertise that Statisticians offer to the world is drawing conclusions from small samples. Therefore, knowing how to design surveys, estimate the right sample size, decide on the right way to ask the question or measure a property are all essential skills for any statistical thinker. The skills you need to be competent in Sampling & Surveys are best captured by my Survey Wheel.
Mention P-values and most people will probably shudder at some memory of an incomprehensible lecture or lesson on statistical tests. Words like null hypotheses, t-tests, statistical significance might pop into your mind with little understanding of what they are about. What you may know is that scientists have to report a p-value for any experiment they do or do they?
The area of Statistical Inference is a core area of study for any statistician. Put simply, Inference means to infer from the observations you’ve made about your data and to draw conclusions about what might be happening in real life. There are two parts to Inference.
All organisations want to understand what has happened in the past and what will happen in the future. The use of statistics and statistical thinking is essential to be a better forecaster but that doesn’t mean it is easy to do! At the same time, we are bombarded with forecasts in the media and that can make it difficult to decide which forecasts to pay attention to and which can be ignored.
My course “Identifying Trends & Making Forecasts” is all about doing the basics right when it comes to analysing trends and making predictions. To support this course, this post makes available a variety of material in the public domain covering the following themes:-
A sound grasp of basic statistical concepts is essential to have any hope of acquiring the mindset of a statistical thinker and to be able to use statistical methods. My introductory course “The 6 Concepts of Statistical Thinking” lays the foundations in the following.
- Probability – the difference between conditional & absolute probability.
- Risk – why it is an extension of probability and the importance of alpha (false positive) and beta (false negative) risk.
- Expectation – how to summarise a dataset into one number which measures its location.
- Variance – how to measure the spread of a dataset.
- Distribution – how to describe the shape of a dataset.
- Correlation – how to measure the relationship between two variables and understand the two golden rules of correlation.
If I were to remark to you that “the weather is very nice today” or “I didn’t like that person”, it is unlikely that I would have made such statements based on a single variable. It is more likely that a combination of variables were evaluated to arrive at these statements. When we analysis datasets with multiple variables, we are undertaking Multivariate Statistical Analysis.
Multivariate Analysis comes in two flavours :-
- Analysis of Correlations between Multiple Variables – Known as R-Analysis – Informally known as reducing the dimensionality of your dataset.
- Analysis of Distance between Many Objects – Known as Q-Analysis – Informally known as mapping, clustering or segmentation of your dataset.
Which is the odd one out from the 3 figures shown below? All are the average number of Americans to die each year from these causes.
- A – 69 from Lawnmowers
- B – 31 from Lightning
- C – 9 from Islamic Terrorists
Do think about your answer before you read on!
“Graduates aren’t skilled enough!” says a BBC headline. What is your immediate reaction? If you decide to find out more and read the article, you will see the following.
- A brief reference to a survey of a 174 organisations, half of whom are apparently moaning graduate skills.
- 3 brief interviews with recent graduates asking what they wish they had learned before starting their job.
After reading this, do you feel that a case has been made that universities are slipping up? How much weight should you place on this article and the information it contains? One of the major problems with news these days is that we are bombarded with articles about so many things that it can difficult to sort the good from the bad, especially when articles are referring to data in one way or another. My Evidence Hierarchy provides a short cut to assess the usefulness of news articles and with a bit of practice, I hope the result will be less stress for you about what is going on in the world.