Stats Training Materials – Sampling & Surveys

The core expertise that Statisticians offer to the world is drawing conclusions from small samples. Therefore, knowing how to design surveys, estimate the right sample size, decide on the right way to ask the question or measure a property are all essential skills for any statistical thinker. The skills you need to be competent in Sampling & Surveys are best captured by my Survey Wheel.

My Survey Wheel has 6 Arcs.

Objectives – What it is that you need to know and from who or what?
Design – How to sample & measure your target population.
Fieldwork – How to collate, clean & code our data.
Results – What are the best charts & tables to summarise our data and what to do with missing data?
Insight – What are the key drivers or segments in our data?
Decisions – What actions can be taken and is the client capable of doing them?

The reason I call this a Survey Wheel rather than a survey line is that my 35+ years experience of advising clients on sampling has taught me the starting point of any sampling project can be any one of these arcs. As a result, the other five arcs have to adapt to the requirements of the starting arc in order to prevent the wheel from breaking.

To learn more about my Survey Wheel, why not download this 7-page PDF document “The 6 Arcs of the Survey Wheel“? Should you engage me as a consultant to work on your survey or sampling project, you will find me following much of what is described in this document.

Alternatively, listen to me explain my Survey Wheel in this 50 minute webinar produced in collaboration with Captive Health in 2015. In this webinar, I focus on many of the common mistakes that people make in designing surveys and samples.

Below is a list of various materials which you can use to learn more about Surveys & Sampling. I have organised these by the 6 Arcs of my Survey Wheel which are colour coded. Red headings are the most statistically advanced arcs where serious statistical thinking is needed. Green arcs are more straightforward computational tasks that can be automated in some cases. Blue headings require softer skills in consulting, facilitation and elicitation but statisticians still have much to offer in these arcs.

A. Objectives

There are 2 parts to the Objectives arc:

What do I need to know? (Goals)
- Is this linked to a decision that is pending?
Who do I need to ask? (Target Population)
- How many sub populations are needed?

For an example of objective setting, take a look at the 5 questions I pose at the start of this post “Who reads fake news?“.

B. Design

This arc has two very distinct spokes, Sampling & Measuring.

The Sampling spoke seeks to answer these two questions:

How many people should be measured?
- What are the objectives?
How should they be selected?
- What are your sampling cells?
- How random is your selection?
- What biases may be inherent in your sample selection?
- Do you need to weight your results?
- Do you have good quality external data to allow you to weight?

For the Measuring spoke, there are 3 questions to answer:

Do you need to use a questionnaire to take measurements?
Are your measurements repeatable, reproducible & unbiased (whether taken by questionnaire or instrument)
Have you tested your measurements prior to the full survey?

I explore some of the issues of sample design in this blog post “Is all-white alright?” which explores how large a sample is needed to decide if all white workforce is statistically plausible or could be an indicator of discrimination.

Often, a statistician needs to link the sample size to the Results arc, especially when there is a requirement to report an estimate to within a specified margin of error. There are many methods available and in my post “Life on Mars” I show how simulation can be used to measure the reliability of gender pay gap statistics and how this is correlated with sample size.

Other times, the sample size will be linked to the Decisions arc. The COVID19 pandemic is a perfect case study on how to make decisions under huge uncertainty and consequences so I wrote this post on what sample size would be needed to make a decision to lift all restrictions and you may be surprised at the answer!

The issue of question design came to the fore during the run-up to the 2016 EU Referendum in Britain. There were persistent differences between polls undertaken by phone and those undertaken online. Matt Singh of Number Cruncher Politics carried out an experiment looking at the impact of including or excluding a Don’t Know option in the question. The results were fascinating and whilst I thought at the time, Matt had drawn the wrong conclusions from his data, it was an excellent piece of work. My conclusion at the time was that Leave voters were more certain on their vote than Remain voters.

A very good introduction to the issue of survey & question design can be found in a lesson plan developed by the Royal Statistical Society for use in schools. I helped the RSS to develop a Marketing Statistician example based on data from a dating site client I worked with years ago. The lesson takes you through the issues of survey modes and biases, weighting of respondents and ways of designing questions given an objective. If you’d like to use this lesson, then please download the following materials –

marketing-statistics-leader-worksheet – This is for the teacher of the lesson and explains how to run it.
marketing-statistics-student-worksheet – This is for the students taking the lesson and explains what they need to do.
Dating Site Data for Schools – This the spreadsheet with the data that the students need to download.

Since the 2024 general election in the UK, voting intentions have fragmented making it harder to measure them. This has led to more focus on how pollsters design their surveys and ask their questions. Here are some articles discussing the issues pollsters face –

Does it make sense to ask for voting intentions when the next election is years away? An insightful argument from Lord Ashcroft who is a keen pollster.
How do pollsters differ in their approach? An excellent summary of the state of play at the start of 2026 from Peter Kellner, a doyen of the polling industry.

C. Fieldwork

The key questions to answer are:

How will you collate the responses? Web, text, face to face, other?
How will you ensure data collation is free from error?
How will you code the responses? (if needed)

If you want to create a web survey yourself, then I have found Google Forms to be quite easy to use. I used this to create a 10-question questionnaire to help people work out the odds of Donald Trump being re-elected as President of the USA in 2020.

D. Results

The key questions to answer are:

How will you handle non-response?
How will you summarise the responses?

Non response or missing data is a serious statistical issue which can affect any sample. The key question is whether those not responding are different in any way from those who did respond. All employers with 250+ employees in Britain will need to answer this question due to the introduction of mandatory Ethnicity Pay Gap reporting sometime in 2027. In 2022, I wrote the draft version of the government guidance to employers in this field which addressed this issue but I am concerned the current government has not taken on board my ideas and their proposals do not address non-response bias properly. The issue is most employers do not know what their employee’s ethnicity is and so they will need to ask them for this information. However, employees cannot be compelled to answer to this question since ethnicity is deemed to be special category data by the Information Commissioner. My experience of such questions tells me non-response for ethnicity can vary between 5% and 50%. I demonstrate how this issue can affect insights and decisions in a presentation I gave in Feb 2026 which can be found under heading P8 of this link.

Summarising data is a standard statistical skill and requires an understanding of basic statistical concepts. You can find a list of relevant posts in section A of this link “Basic Statistical Concepts“.

E. Insight

The key spokes of this arc are:

What are the key drivers of the key questions?
Do these drivers differ by segments within the target population?

The 1st spoke, Key Driver Modelling, requires expertise in Statistical Modelling. This allows you to explore and explain the relationship between a set of key questions and a set of potential driver questions. If you would like to learn more, then why not take a look at my training course “Understanding your world with Statistical Modelling“?

The 2nd spoke, Segmentation, requires expertise in Multivariate Analysis. You can find out a list of relevant posts in this link “Multivariate Analysis” but one post that is particularly relevant is one I wrote at the beginning of 2017 which looks at the similarities between Brexit & Trump.

F. Decisions

The key spokes of this arc are:

What are the constraints on any action that the organisation can take?
What are the capabilities of the organisation to take action?

This arc covers a lot of potential topics but you may find my post on “How to close your gender pay gap with DMAIC” of interest.

Sometimes, a decision based on samples can be borderline making it difficult to decide. Between 2016 & 2020 I worked with the CLAIRE, the industry body representing professionals in the contaminated land sector, to rewrite their statistical guidance for interpreting data. I used three border line scenarios to illustrate the guidance which explained how to confidence intervals to interpret the results of small samples. Click here for more details and to download the guidance itself.

If you would like to book a training course in Statistical Sampling, then please contact me.

For more information about my other training courses in statistics, please visit my Statistical Training homepage.