You have just started work for a new employer and with you joining, the company now has 25 employees. All are white including you. Would you raise your eyebrows at that?

Obviously, the first question to ask is what is the racial breakdown of the pool of candidates from which the employer recruited. If the candidate pool is 100% white then it would be a surprise if all employees were not white. Conversely, if the candidate pool is 50/50 white/black then an all white workforce should prompt questions. This motivates the question “*given the racial breakdown of a candidate pool, how likely is it that an employer will end up with an all-white workforce purely through the laws of chance even if they do not discriminate in any shape or form?*“.

**UK Supreme Court Clarification on Indirect Discrimination**

This question has become more pertinent in the UK following a Supreme Court judgement in April last year which clarified the law on indirect discrimination. The court made it clear that Parliament’s intent is that once a plantiff can prove that a statistical discrepancy exists that disadvantages a protected class, then the burden of proof shifts to the defendant to explain why that discrepancy exists. The court also made it clear that even then, the plaintiff can only win their case if the employer’s explanation is not good enough. In the two cases they were asked to rule, they found for the plaintiff in one and for the defendant in the other. So the lesson for all employers who come under the various Equality Acts is to be aware of any statistical discrepancies that might exist in your workforce and to be clear on reasons why this might be the case.

Since the first hurdle is that the plaintiff has to prove that a statistical discrepancy exist, the court will require expert testimony by a statistician who is also independent of either party. That is exactly what I offer to my clients, **independent statistical advice & expertise**, and I have acted as an expert witness in discrimination claims. If you would like more information, then please click here for details on my credentials and click here for information on how to contact me. Should you decide to engage my services, regardless of whether you are the plaintiff or defendant, I am sure you will want an idea of how I will determine if a statistical discrepancy exists and this article will introduce the 4-step process I will use:-

- Determine the baseline ethnicity of the areas where your employees live.
- Determine the ethnicity of your candidate pool.
- Calculate the likelihood of an all-white workforce using the Binomial Distribution.
- Determine if an all-white workforce without discrimination is plausible using Bayes Rule.

Whilst this article focuses on Ethnicity, exactly the same 4-step process can be used with any protected characteristic.

**Step 1 – Determine the baseline ethnicity of the areas where your employees live**

Let’s suppose my fictional company of 25 employees is based in the city of Bath where I live. To see if an all-white workforce is plausible without discrimination, the first step is to determine the racial breakdown of the candidate pool. In this instance I will start with the 2011 Census to work out what % of each parliamentary constituency that is within commuting distance of Bath is categorised as White UK**. I use parliamentary constituencies since they have roughly the same population (on average 75,000 registered voters each) which makes it easier to determine the average and distribution of possible values. The map below shows the constituencies where I personally know of people who commute to Bath from and the whole area is sometimes denoted as “West England”. For each constituency, I have given the % of the population defined as White UK**.

*** I define White UK to be the sum of the census categories White British, White Irish & White Traveller. Basically it is all White categories minus White Other.*

Each constituency’s White UK figure is colour coded with green denoting major urban areas (Bath, Bristol & Swindon) which are linked by rail, blue for rural areas where commuting by rail to Bath is possible and brown for rural areas where commutes would mostly have to be by car. On average, urban areas are 84% white UK and rural areas are 95% white UK and these figures are quite consistent across West England. Therefore, I will make the assumption that our fictional company will be recruiting from a candidate pool that is between 85% and 95% white UK with an expected value of 90% white UK.

**Step 2 – Determine the ethnicity of your candidate pool**

If I was engaged as an actual expert witness, I go a step further and refine this figure to take into account any additional data that would increase or decrease this expected value. I will not do this here but such data could include the following:-

- The figures shown in the map are for the entire population whereas a company’s candidate pool will come from the working age population which is usually defined as 16-64 but for some professions that require extensive training, the candidate pool might the age range 21 to 70 say. From memory, the UK census shows that under 18s are less white and pensioners are more white than the average. Depending on the proportion of both young and old in a constituency, the white UK figure will change.
- Bath has two universities and thus a very high population of students (22nd highest out of 632 constituencies in Britain). If our employer was a cafe say then you might expect it to employ students but if it was a professional services company, you would expect the opposite. Clearly this has an effect on the candidate pool if the student population is disproportionately non-white.
- Some companies require specific skills that can only be acquired by having a degree or equivalent qualification. If the population of graduates is disproportionately white or non-white, this will have a knock-on effect on the candidate pool that the company can recruit from.
- There may be a dominant employer in the constituency that “sucks up” most of the potential labour within the constituency. As a result, the company may have to rely on commuters far more than other employers in which case, the ethnicity of the commuter belt may be more important than the ethnicity of the location of the employer.
- Some ethnicities may be disproportionately economically inactive due to cultural reasons e.g. discouraging women from working.

There are many other factors that could be considered. Note I do not consider secondary questions such as “why do you only recruit graduates?” Clearly, a company’s recruitment policy may contain hidden biases but these would be questions that would be considered after I have determined whether a discrepancy exists between the ethnicity of a company’s workforce and its current candidate pool. Recall that the Supreme Court ruled that a discrepancy must be proved first. Only then, can we proceed to ask more questions.

**Step 3 – Calculate the probability of an all-white workforce using the Binomial Distribution**

Let’s assume that our company does not discriminate at all. Then for a Bath based company, we can say that each employee has a 90% chance of being white UK (note my definition of “all-white” is actually shorthand for “all-white UK”). To do our calculation we should actually say that the probability of each employee being white UK is 0.9 since probability is a decimal on the 0 to 1 scale.

If the company only has one employee, then there is a probability of 0.9 of him or her being white UK. If the company now has two employees, what is the probability that both are white UK? This turns out to 0.9 x 0.9 = 0.9^2 = 0.81 provided we assume that neither employee influences the probability of the other being employed. By induction, I hope you can see that for a company with 3 employees, the probability that all 3 are white UK is 0.729 = 0.9^3 = 0.9 x 0.9 x 0.9.

So the general formula for calculating the probability of a company of N employees being all-white UK given that the probability of each employee being white UK is P is P^N. For our fictitious Bath company of 25 employees where the candidate pool is 90% white UK, the probability of an all-white workforce is 0.9^25 = 0.072 or 7.2%.

What if the candidate pool was 85% or 95% white UK? What if the company had 50 employees instead? The chart here shows the probabilities of an all-white workforce under multiple scenarios and table 1 summarises some key scenarios.

Table 2 reverses the calculation. This shows the company size for which we would expect X% of companies to have all-white workforces given the proportion of the candidate pool that is white. So if, your candidate pool is 95% white UK, then we would expect 10% of companies with 44 employees to have all-white workforces.

**Step 4 – Decide if an all-white workforce without discrimination is plausible …**

Can I conclude that an all-white workforce for a company of 25 employees based in Bath constitutes a statistical discrepancy that shifts the burden of proof onto the employer to explain? Table 1 above shows that the probability of an all-white workforce without discrimination is 7% if the candidate pool is 90% white. In other words, 1 in 14 companies similar in size to our fictional company can be expected to be all-white. If there are only 5 similar companies in Bath then I can conclude that an all-white company is unlikely to occur and would rule that the company is a likely discrepancy. On the other hand, if there are 100 similar companies in Bath, then I would be surprised not to see an all-white company and therefore my fictional company is possible.

The paragraph above is an extremely important one that many people who are familiar with the concept of P-values overlook. The probabilities of all-white workforces I have calculated here are in fact p-values for a null hypothesis that the probability of each employee being white UK is equal to the proportion of the candidate pool that is white UK i.e. company does not discriminate. This is known as a **Conditional Probability** and is denoted by the convention **P(outcome | conditions)**. What I am doing here is answering the question “*what is the probability of an all-white workforce (outcome) if I assume that the null hypothesis is true (one condition) for a company of 25 employees (another condition)*” which can be written as **P( 25 white UK employees | company has 25 employees & company does not discriminate on ethnicity when recruiting from its candidate pool ).**

Is this the same question that an employment tribunal would ask? I think they are much more likely to ask the following question “*On balance of probabilities, is it likely that the observed discrepancy (an all white workforce in a company of 25 employees) is the result of random chance?*” Making decisions on balance of probabilities is a core part of civil law and the statistical question that has to be answered is whether this inequality **P( company does not discriminate | company has 25 employees and all are white UK ) > 0.5** is true or false.

**Step 4b – … using Bayes Rule whereby you need to …**

Notice that I have written two conditional probabilities in the previous paragraphs and both contain the same 3 elements (number of employees, all white workforce, company does not discriminate when recruiting from its candidate pool) but the o rder of the 3 elements differs. I will write both conditional probabilities again below to make it easier to see along with a 3rd conditional probability which I will explain shortly.

- P( company has all-white workforce | company has 25 employees, company does not discriminate … ) aka
**Likelihood**(and P-value) - P( company does not discriminate … | company has 25 employees, company has all-white workforce ) aka
**Posterior Probability** - P( company does not discriminate … | company has 25 employees ) aka
**Prior Probability**

Via a major theorem of Statistics known as **Bayes Rule**, these are linked as follows: “* Posterior Probability is equal to the Likelihood multiplied by the Prior Probability divided by a calculated constant*“. As defined above, the prior probability is the probability that the company does not discriminates but isn’t this what we are trying to calculate in the first place?

**Step 4c – … elicit the Prior Probability of a company not discriminating and …**

Not quite. Recall the scenario we are working to here is a company of 25 employees being sued for racial discrimination because its workforce is all-white. If the company had one or more non-white employees, would it be sued for racial discrimination? Possibly, but not on the grounds that it is all-white since that is factually incorrect here. What the Prior Probability measures is the known (or estimated) probability of a small employer **not** engaging in discriminatory practices **before** the facts of the case are known. If you could undertake a nationwide survey of UK employers, in what % would you find not find evidence of racially discriminatory practices, either directly or indirectly? Personally, I don’t know the answer but there are some experts out there who could give evidence on this.

As a statistical expert, it would be my responsibility to pose the question to these experts (perhaps by reading their published research) and then **elicit** from their response what the prior probability is. I would almost certainly end up with a range of possible answers which is not a problem for me since Bayes Rule can work with what is known as a **Prior Distribution**. In the worst case scenario, the experts would be completely unable to agree among themselves, in which case I would conclude that the prior distribution is **uninformative** and that all possible values for the Prior Probability are equally likely i.e. anything between 0% & 100%.

**Step 4d – … divide by a calculated constant**

I am not going to explain the calculation of the constant as it can be quite complex which is why you need a statistician to do it! Instead I will illustrate the outcome using a simplified example.

Let’s suppose that in the scenario we are working with, the experts agree that the Prior Probability of a company being a racial discriminator is 20% i.e. there is a 80% chance that the company does not indulge in racially discriminatory practices. If a company does discriminate, does it ban non-white employees completely or does it simply make it harder for a non-white person to be recruited? The latter is more likely but we need to quantify how much harder it is. The simplest way to do this is to change the probability of a candidate being white. For example, if the probability of a candidate being white is 0.9 for a non-discriminating employer, we could then say the practices of a discriminator has the effect of increasing this to 0.95 i.e. if a non-discriminator interviews 20 candidates 2 will be non-white whereas for the discriminator only 1 out of 20 candidates will be non-white.

Back in step 4, I made it clear that the conclusion you draw depends on how many equivalent companies there are in the first place. Suppose we have 50 such companies, then the prior probability tells us that 10 will be discriminators using a 95% white candidate pool and 40 will be non-discriminators using a 90% white candidate pool. We can now create a matrix as shown below where the columns split discriminators from non-discriminators and the rows splits all-white workforces from those that are not all-white. We use the chart from step 3 above to work out how many of the 10 discriminators and the 40 non-discriminators will have all-white workforces. It turns out that in both cases, just under 3 companies will have all-white workforces as shown in Scenario 1 below.

So in Scenario 1, is an all-white workforce for a non-discriminatory company at all plausible? The answer is clearly yes since the company being sued is just as likely to come from the non-discriminators as it is from the discriminators. We do this by reading the shaded YES row with bolded numbers and the two numbers here (2.8 vs 2.9) tell us whether on balance of probabilities, the company is more likely to be a discriminator or a non-discriminator. Here it is clearly 50:50 and I would have to testify accordingly to the employment tribunal that the all-white workforce is plausible for a non-discriminator.

In the other 3 scenarios, I would have no hesitation in testifying to the tribunal the opposite i.e. it is more likely than not that the company being sued comes from the discriminators rather than the non-discriminators. In scenario 4 where there are 25 equivalent employers with 10 expected to be discriminators with 95% white candidate pools and 15 expected to be non-discriminators with 85% white candidate pools, the expected outcome is that only 3.1 employers out of the 25 would have all-white workforces of which 2.8 would come from discriminators and 0.3 from non-discriminators.

When other experts are unable to agree on the probable split of discriminators to non-discriminators and the degree of discrimination applied by the discriminators, I would have to calculate a large number of scenarios and take an appropriate weighted average. That is why I say the calculation of the constant in Bayes Rules is complicated and why you need a statistician to undertake the calculation.

**Why you need to speak to a statistician!**

I hope you have been able to follow my explanation up to this point. As you can see, just because a company has an all-white workforce, one cannot immediately claim that this is the result of racial discrimination. There is a 4-step process that I need to steer you through so as to document the relevant figures and to arrive at a conclusion as to whether the all-white workforce is more likely than not to have come from an organisation with discriminatory practices. It is important to note that my calculations here are based solely on the known fact that a company has an all-white workforce and if there is additional evidence available, Bayes Rule is sufficiently flexible that I may be able to incorporate that evidence as well in my calculations.

Everything I have said here applies to any protected characteristic such as gender, disability, sexuality, etc. If you are involved in a claim of discrimination of any kind and would like me to estimate the probability of discrimination taking place, then please click here for information on how to contact me.