Mention P-values and most people will probably shudder at some memory of an incomprehensible lecture or lesson on statistical tests. Words like null hypotheses, t-tests, statistical significance might pop into your mind with little understanding of what they are about. What you may know is that scientists have to report a p-value for any experiment they do or do they?
The area of Statistical Inference is a core area of study for any statistician. Put simply, Inference means to infer from the observations you’ve made about your data and to draw conclusions about what might be happening in real life. There are two parts to Inference.
- Exploratory analysis – where you explore your data through charts, tables and other statistics and end up with one or more hypotheses about what might be going on.
- Confirmatory Analysis – where you seek to confirm your hypothesis which can often be through the use of statistical tests but should not be exclusively confirmed through such tests.
I am a fan of using the criminal justice system as an analogy to explain this. When a crime occurs, the police investigate and collect evidence i.e. they undertake an exploratory analysis of the data. The outcome of this is a hypothesis that a person is guilty of the crime. That person is then tried in a court where the null hypothesis is that the person is innocent. The evidence is then examined via a statistical test and the outcome is a p-value that the jury uses to come to a verdict. Either the verdict is to reject the null hypothesis of innocence and therefore find the person guilty or the verdict is that the null hypothesis cannot be rejected and therefore the verdict is not guilty. At no point does a court conclude that the person is innocent, that is not the outcome of a statistical test.
Below is a list of various materials that you can use to learn more about hypothesis testing.
A. Experimental Design
Classically, a hypothesis should be specified before any data is collected. This leads you into the area of Experimental Design (or DOE) which is a vast area of statistics. If you do this, then conclusions drawn once the data has been analysed are usually sounder than data collected by other means.
More commonly, a hypothesis is generated after some data has been collected and analysed. The problem with this approach is that the way the data was collected may not be sufficient for you to draw firm conclusions. In reality, any conclusions should be treated as hypotheses for a proposed experiment.
Two blog posts of mine explain more.
- Find out the difference between experiments and observations in my Evidence Hierarchy.
- See an example of an experiment and how it can be improved in “Who reads fake news?“
- What is the gold standard for an experiment? The answer is GRRaCE which I will expand upon in a post soon.
B. Statistical Tests
I don’t have much material on statistical tests at the moment so I will update this section later.
C. Confidence Intervals
Confidence intervals are often recommended as an alternative to using P-values when assessing statistical significance. Together they are like two sides of the same coin and a case can be made that communication of results is easier with confidence intervals rather than p-values.
Here are some examples of confidence intervals in action.
The heart of traditional hypothesis testing is the calculation and the interpretation of P-Values. Many scientists and researchers in many fields have learned that this is how you decide if your research is statistically significant.
Unfortunately, the use of p-values has not conformed to good statistical practice and a number of issues have emerged. As a result the American Statistical Association (ASA) undertook a widespread consultation to see if these issues could be addressed. The outcome of the consultation has been a series of guidances which are listed below.
- In March 2016, the ASA Statement on P-Values was published which explained how P-values can be misused. The full statement can be downloaded here. This was widely discussed throughout the world of research.
- In September 2016, the statement was a keynote session at the Royal Statistical Society’s (RSS) conference in Manchester. I am in the front row of the youtube clip taking many notes!
- In March 2019, the ASA published new guidance “Moving to a world beyond p<0.05“. This note is intended to be guidance on what alternatives there are to using p-values to undertake statistical inference.
The conclusion from the 1st link really resonates with me and is the basis of how I teach hypothesis testing in my courses.
“Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.”
For more information about my other training courses in statistics, please visit my Statistical Training homepage.