In November 2023, the Office of National Statistics (ONS) resumed publishing estimates of the UK Ethnicity Pay Gap after a 3 year hiatus for the Covid19 pandemic. I have organised this data into a user-friendly (hopefully!) Microsoft Excel spreadsheet which allows you to see how the median hourly pay has changed between 2012 & 2022 for a variety of ethnicities & other categories. This article explains how you can use this spreadsheet and why you must look at ethnicity pay gap data in a different way than you would for gender pay gap data.

**Click here to download my spreadsheet ONS Ethnicity Pay Gap Trends 2012 to 2022 – version 2023 v1**

Please read the HELP page and the rest of this article before you start using it. You are free to reproduce any chart or table but I would appreciate it if you could credit me for creating these.

**Layout of ONS Ethnicity Pay Gap Trends 2012 to 2022**

There are a number of worksheets (some hidden, some protected but password-free) but the most important are these –

**HELP**– this lists the main worksheets and briefly explains what each does**C_Trends**– using this, you can look at the trends in median hourly pay for any two categories (which are denoted as reference & comparator) between 2012 & 2022.**C_Intersectional**– here, you can look at the intersection between any two ethnicities and other categories such as sex, age and whether born in the UK or not for a specific year.**C_Regions**– here, you can look at how median hourly pay varies by ethnicity and region for a specific year.**PvtData**– A pivot table with all the data used in the above worksheets and more. You can reformat this pivot table to whatever you want.**ONS_Data**– This is the source data worksheet for the pivot table and the 3 output worksheets above.

There are a number of other **ONS_** named worksheets which are explained in the **HELP** worksheet and are worth familiarising yourself with.

The three **C_** worksheets (Trends, Intersection, Regions) are explained in detail in separate sections below. Before I do so, I want to explain how the ONS estimate ethnicity pay gaps using the **Annual Population Survey.**

**What is the Annual Population Survey?**

You may already be familiar with ONS estimates of the UK gender pay gap which has been published annually since 1997. These are based on a random sample of HMRC payroll data which is known as **ASHE** (which I describe in more detail here). For tax and historical reasons, HMRC always hold a record of your age and sex hence why the ASHE dataset can be created and used for estimating gender pay gaps. However, HMRC do not hold records of your ethnicity (rightly so) which means ONS need to use a different dataset known as **APS** or **Annual Population Survey** to estimate ethnicity pay gaps.

The APS is a subset of the larger **Labour Force Survey** (LFS) which measures many of our national employment statistics. Not every respondent is asked for their salary and hours information and not everyone invited to respond actually responds. As a result, whilst the sample in theory could be of a similar size to the ASHE sample used for gender pay gap reporting, the reality in 2022 is that the current sample size of **47,000** is only a quarter of what it could be.

Part of the reason for this is the COVID19 pandemic which made it harder to reach respondents. Prior to the pandemic, the sample size was almost twice as large. The fall in sample size is also seen in the wider Labour Force Survey and the ONS is concerned about this and is actively investigating ways of improving the situation. The question we need to consider at this point is whether the APS is still good enough for measuring ethnicity pay gaps?

In my opinion, it is still good enough because our main purpose when using this data is to compare trends between different groups. If our goal was to measure a national statistic of some kind, I would be more concerned since falling sample sizes could introduce a bias. Our goal though is comparison and even if there is an overall bias, it will only be problematic if the bias is more severe for some ethnicities than for others. I do not see evidence of such differential bias yet and therefore I still consider the **APS** to be suitable for ethnicity pay gap analysis. As will be seen in the forthcoming **C_Trends** section, the spreadsheet tracks sample weights over time and so far I am not seeing different trends in these by ethnicity.

Unlike **ASHE** which takes its pay data direct from HMRC records, **APS** is a survey that asks respondents for their ethnicity, pay and hours information on top of other variables. That creates a potential source of additional error in the way people choose to respond to this question which is a voluntary one as well. To see the full list of questions asked in the Annual Population Survey, you can download the user guide for 2022 here.

**How does ONS measure UK ethnicity pay gaps?**

For each respondent, ONS calculate the hourly pay in a similar way to that done for gender pay gap reporting. The method is not quite identical since it is not possible for the interviewer to break down the respondent pay slips into different elements. Also the survey is carried out throughout the year unlike **ASHE** which is based on the 5th April of each year.

Once calculated, ONS can then calculate the median hourly pay for each category of interest. All the raw data tables can be downloaded from the ONS website here. I have used tables 1 to 8 from this link to create the spreadsheet you’ve downloaded and you will find a description of each table in the **ONS_TABLES** worksheet of your spreadsheet.

In your spreadsheet, you will find the following **5** categories available for your analysis –

**Ethnicity**– two options,**Big 5**and**ONS18****Sex**– Men and Women**Age**– 16 to 29 years old and 30 or older.**Where Born**– In the UK or outside the UK.**Region**– Scotland, Wales and the 9 regions of England.

The **ONS18** ethnicities are the ethnicities used in the 2021 census in England & Wales. In 2021, this list consisted of 19 ethnicities but one (White-Roma) was new and is not used in the **APS**. The remaining **18** ethnicities can then aggregated into **5** higher level ethnic groups which are known as **Big 5**. In earlier releases, ONS also offered a 3rd category **Binary** where all non-white ethnic minorities were aggregated into a single category which used to be denoted as **BAME**.

The table below shows the ethnicity profile of England and Wales in 2011 & 2021 using **ONS18** ethnicities which are then aggregated into **Big 5** and **Binary** as shown. Note that here **Big 5** is actually Big 6 but in the **APS**, White British & White Other are combined into a single White category.

The shorthand codes shown in the table here (W-B, A-I, B-A, etc) are also used in your spreadsheet.

**The importance of trends, sample sizes & confidence intervals**

If you compare the latest ONS release (November 2023) with the previous release (October 2019), you will see a number of changes, all of which I strongly welcome. They are –

- The abandonment of Binary ethnicity pay gaps.
- Including 95% confidence intervals alongside published pay gaps.
- Including sample sizes and population weight in data downloads.
- Specifying a minimum category size of 26.
- The emphasis on analysing long term trends rather than single year figures.

The most common mistake I see among employers who attempt to analyse their payroll by ethnicity is the belief that all the complexity of such data can be reduced to a single statistic, namely the binary ethnicity pay gap which measures the difference between the median hourly pay of white employees and the median hourly pay of non-white employees. This happens because employers are mistakenly trying to replicate the gender pay gap reporting process which is centred around a binary gender pay gap statistic.

The UK gender pay gap reporting system is inherently **binary** from the point of view of its metrics and reporting. It was a copy and paste of the Office of National Statistics (ONS) process used since 1997 to measure the UK’s gender pay gap. The ONS process was designed by statisticians for statisticians and was never intended to be used by HR professionals with no understanding of statistics. I consider this to be one of the reasons why mandatory gender pay gap reporting has not had an observable effect on the UK gender pay gap since 2017.

Ethnicity is not binary as evidenced by the recent 2021 census which listed up to 19 ethnicities to choose from. White and ethnic minority populations are not of equal size like men and women, they are majority and minority. The ethnic majority to ethnic minority ratio varies enormously between geographical areas in the UK unlike gender which has virtually the same ratio of men and women in all areas.

These are the fundamental reasons why I have repeatedly said a binary ethnicity pay gap statistic is meaningless and actionless so I am delighted the ONS now agree with me and will no longer publish such pay gaps. That doesn’t make the ONS report perfect as they continue to focus on pay gaps rather than representation gaps (which I first called for in 2020) but they have also added new features which mitigate the worse features of pay gap statistics.

The most important of these to my mind is the addition of **95% Confidence Intervals** to estimates of pay gaps. The chart here shows an example of these which will appear again in the next section on **C_Trends** so I won’t explain everything now. What this shows is for every **£1** paid per hour to the reference ethnicity (happens to be ONS18 White-British), the comparator ethnicity (happens to be ONS18 Black-Other) is paid so many pence more or less as shown by the black dashes. If one had focused solely on these dashes, one might think there is a trend from 2012 when the comparator was paid **13p** more than the reference to 2022 when the comparator was paid **13p** less than the reference.

However, the APS sample size for the reference ethnicity is low having fallen from **100** to **50** over that time which means there has to be a large margin of error in the estimate of the difference. This is what the 95% confidence intervals as represented by the shaded green bars either side of each black dash shows. The typical interpretation of these is that there is **95%** chance the true difference between reference and comparator lies within the range shown e.g. for 2012, the comparator is estimated to be paid between **6p** less and **35p** more than the reference and by 2022, the comparator is estimated to be paid between **26p** less and **2p** more than the reference. In both years, the no pay gap scenario of when the comparator is paid the same as the reference i.e. the difference is **0p**, is not ruled out because that scenario lies within the 95% confidence interval as is the case for all years between 2012 & 2022.

This does not mean we can conclude there is no pay gap between the comparator and reference, the wide confidence intervals are instead telling us to be cautious and not draw strong conclusions. The low sample size for the reference in this instance demands from us a respect for the laws of chance which tell us that significant changes from year to year can and will occur. By including such confidence intervals, the ONS force their readers to take note of this.

I should point out the current APS sample size of 47,000 is typical of many large employers (The Department for Work & Pensions employ twice as many). Most employers though will have a lot less than this and so consequently, their confidence intervals will be wider than this. This issue is much less common in gender because the ratio of men to women is basically equal wherever you are in the UK so even small employers are likely to have sample sizes sufficient to enable reasonable conclusions. With ethnicity though, some sample sizes will be very small and by using confidence intervals, one can convey the impact of such small sample sizes.

In fact, the ONS have now instituted a policy of not calculating confidence intervals when the comparator sample size is **25** or less. They still publish the pay gap but it is explicitly flagged as statistically unreliable. One the features I did like about the new government guidance for ethnicity pay gap reporting was its recommendation for a minimum category size of as least 50 be used for external reports and of at least 20 for internal reports. What the ONS have done is broadly consistent with the government recommendations especially when we bear in mind that the ONS report is prepared and written by professional statisticians whereas employer reports are written by HR professionals with little or no statistical skills.

The other new feature of the ONS ethnicity pay gap reports is the greater emphasis on looking at long term trends rather than recent year on year changes. When one is working with small sample sizes, this is the only way to look at ethnicity pay gaps and is why I created the **C_Trends** worksheet in your spreadsheet.

**How to use C_TRENDS**

This worksheet is split into 3 parts from left to right.

- On the left, you can state which categories you wish to designate as the reference and comparator.
- In the middle, you will find the output charts and tables.
- On the right are background calculations for the charts which you should ignore and not touch unless you know what you are doing.

The reference & comparator categories are specified by changing what is shown in the yellow cells in rows 5 & 6 in columns C to G. The options you can enter here are shown below row 6 and in the dropdown lists provided. Note the orange cells in row 5 are currently formulas but once you understand why those formulas are being used, you can overwrite the formulas if you wish.

In row 4 above these user input cells, you are told which set of ethnicities can be used with the other categories. The ONS does not publish all possible combinations of data hence why I put the reminder there. In short –

- When Region = GB and Sex, Age and Born are set to ALL, you can use either ONS18 or Big5 ethnicities.
- When using other regions, you can only use Big 5 ethnicities (W, B, A, M, O)
- When using Sex and Age categories, you can only use ONS18 ethnicities (W-B, B-C, A-P, etc)
- When using Born categories, you can use either ONS18 or Big5.
- You cannot use a combination of Region, Age, Sex, Born categories, you must choose only one of these 4 options at a time.

In order for confidence intervals to be displayed properly, the reference ethnicity must be **W** (when using Big5) or **W-B** (when using ONS18). You can use different references if you want but the confidence intervals will not be shown if you do this.

Let’s now look at the charts and tables in the middle part of the worksheet. To demonstrate this, enter **W-B** for the reference ethnicity in cell G5 and **B-O** in cell G6 for the comparator ethnicity. These are the ethnicities I used to demonstrate the confidence interval chart earlier and the full chart output looks like this.

This shows the pay gap & confidence intervals chart you saw earlier along with the actually median hourly pay for W-B & B-O on the left which is used to calculate the pay gap on the right. At the top of the graphic is a statement of what the reference and comparator ethnicities are. In the labels of the left hand chart, the average sample size over the years 2012 to 2022 is shown for the reference (**52,271**) and comparator (**82**) ethnicities.

Looking at actual median hourly pay in the left hand chart, what do you see? The immediate feature are the large swings year on year in the median hourly pay of B-O whereas the median W-B pay shows fairly steady increases. Note I’ve added the national minimum hourly wage (for adults) for reference which has increased by over **50%** since 2012. The increase for W-B pay doesn’t quite match that but if one were to compare 2012 & 2022 only for B-O pay, one would see almost no change.

The above is the backdrop to the pay gap & confidence interval chart. Recall, I showed earlier that we could not rule out the no pay gap scenario in each year even though the black dashes appear to show a downward trend. When you combine the two charts, which do you think is more likely; the apparent trend is real or the apparent trend is an artifact of the laws of chance given the low sample size for B-O ethnicity? My conclusion would be the latter.

Analysing trends and determining whether they are real or not is a training course I run all the time. You can find out more details here and I will running this via the Royal Statistical Society if you would like to book a place. One of the tools I teach when trying to identify the underlying trend in data is the concept of a **centred moving average**. This is shown in the pay gap & confidence interval chart as a dashed green line which represents the 5 year centred moving average. This is apparently showing a slow underlying flip from a small pay gap in favour of B-O to a small pay gap against B-O. In fact the trend here is not statistically significant but that is not the same as saying there is no trend, just that we can’t be certain there is a trend.

In addition to the charts, the middle part of this worksheet includes a set of tables as below. Note the reference and comparator ethnicities are W-B and B-O respectively.

This table shows the numbers displayed in the charts plus some more information. First you can see the actual sample sizes for each year and a colour coding system is used to denote small sample sizes which is red for **<20**, orange for **<50** and yellow for **<100**. This colour code is used throughout the spreadsheet.

The second additional piece of information are the two **Sample Weights** in the last 2 columns on the right. The APS estimates the number of people working in the UK by ethnicity which allows ONS to design a sample that is representative of the working age population by ethnicity. For example, if **5%** of temployees belong to ethnicity **A**, then one would expect **5%** of the APS sample to belong to A. If in fact **6%** of APS sample were from A, then that ethnicity would be oversampled and the sample weight would **0.83** (= **5%/6%**) i.e. less than **1**. If only **4%** of APS were from A, then the ethnicity would be undersampled and the sample weight would be **1.25** (= **5%/4%**) i.e. greater than **1**.

For W-B, the sample weight is more or less **1** whilst B-O varies between **1.3** & **1.6** indicating that ethnicity is undersampled, suggesting a difficulty with recruitment. What would concern me here is less whether an ethnicity is under or over sampled but whether there is a clear trend to the sample weight away from 1 that could indicate an emerging bias in the pay gap estimate. Here I see no trend since the variation year on year is consistent with small sample sizes.

Before I move on, I want to point out that throughout the spreadsheet, I display pay gaps using the format of for every **£1** paid to the median reference employee, the median comparator employee is paid **Xp** less or more or they are paid **£x.yz.** The ONS continue to use the same format as gender pay gap reporting i.e. a pay gap of **+10%** means the median comparator employee is paid **10%** LESS than the median reference employee. I have never liked the standard reporting format as it is easy to misunderstand as I explain in the 1st point of this article.

Finally all charts in the spreadsheet are editable which means you can change the scales and colours if you wish.

**How to use C_INTERSECTIONAL**

The intersectional tables shown in this worksheet can only be produced if you use **ONS18** ethnicities, not Big 5. This worksheet runs off the ethnicities entered in **C_Trends** and provides a breakdown by **Age, Sex** and **Where Born** for the year of your choice which you can enter in the yellow cell E3. The example below shows the breakdown for 2022 for a comparison between the W-B (White-British) and B-C (Black Caribbean) ethnicities.

Starting from the left, the first figures shown are the median hourly pay for W-B and B-C which are the same as those shown for 2022 in **C_Trends**. Below these, you can see that for every **£1** paid to the median W-B employee, the median B-C employee is paid **98p** i.e. a small pay gap that is actually not significant when you look at the confidence intervals shown in **C_Trends**. Whilst this worksheet does not show confidence intervals, you can always get an idea of what these are by using **C_Trends.**

The first intersectional table is for **Place of Birth** which is either in the UK or overseas. For every **£1** paid to the median UK born W-B employee, the median UK born B-C employee is paid slightly more at **£1.02**. However, among those born overseas, the median B-C employee is paid **76p** for every £1 paid to the median W-B employee which is a considerable gap. If one looks at the actual medians themselves, it is apparent that one reason for this is because for every **£1** paid to the median W-B employee born in the UK, the median W-B employee born overseas is paid **£1.17** which is what is shown in the Ratio column.

The next table looks at **Age** which compares young employees with middle aged and older employees. This shows among those under 30, the median B-C employee is paid **£1.06** for every **£1** paid to the median W-B employee but this flips around among those over 30 where the median B-C employee is paid **94p.**

The last table looks at **Sex** with men as the reference category. I find this table the most interesting one here as this shows a clear **interaction** (the correct statistical term for intersectional) between ethnicity and age. Among men, the median B-C employee is paid less than the median W-B employee but among women, this is the other way around. Among W-B employees, the median man is paid more than the median women but among B-C employees, the median woman is paid more than the median man. When placed in order by median hourly pay from highest to lowest, the order is W-B man, B-C woman, B-C man, W-B woman.

In addition to these tables, further tables are given underneath which showing the working age population, sample size and sample weights. These are there to help you decide if the observed intersections are likely to be the result of small sample sizes. For example, you might at first find the concept of White-British employees being born overseas somewhat unlikely and indeed, only **3%** of all W-B employees are born overseas which differs from Black-Caribbean employees where **26%** are born overseas. When you look at the working population table though you can see whilst a small population as a percentage, the overseas born White British population is in fact twice the size of the entire Black-Caribbean population.

**How to use C_REGIONS**

This time, you can only analyse the data using **Big5** ethnicities. These are already preloaded into this worksheet along with the regions and the only thing you can change is which year to display which can be controlled by changing the value in the yellow cell H2. Another feature you can control is to suppress data where the sample size is below a certain threshold. This threshold is controlled by the yellow cell in H3.

In the table shown here for 2022, all combinations of region and Big 5 ethnicity with a sample size less than** 50** are greyed out. This table starts in cell B5 of the worksheet but the table is repeated lower down in cell B45 this time without any greying out of the data. The sample sizes appear on the right hand side in cell S5 and the confidence intervals appear on the left hand side in cell B27.

The data in the table shown above is then plotted in the chart below. You are able to edit the chart so as to display certain ethnicities only rather than all ethnicities. By supressing small sample size estimates, the chart becomes easier to read. Note the order of the regions has been laid out based roughly on distance from London with the furthest away first before ending up in London.

A regional pattern should be apparent to you here and it is one I have seen before. Pay is highest in London followed by the commuter belt surrounding London in the East and South East regions. Further afield, little difference between regions is apparent.

This pattern motivates the London effect table shown below the chart. One of the main reasons why ethnicity pay gap analysis cannot be done in the same way as for gender is because the ethnicity profile of London is so different from the rest of the UK. If you look at the first table below, you can see only **10%** of white employees work in London compared to roughly **40%** of all ethnic minorities. That means when you look at national median hourly pay, the figures for ethnic minorities are likely to be affected by the disproportionate number working in London and thus receiving a premium in pay.

This disparity can create some apparent paradoxical results which is a phenomena known as **Simpson’s Paradox**. I have written about this before in my article “*What is the gender pay gap at Novartis?*” and such effects are much more likely to happen when analysing ethnicity. In the second table, the median hourly pay for all permutations of region and ethnicity are all shown using the national median hourly pay for white employees as the reference i.e. **£1**. So for example, although the median Other employee is paid less than the median White employee in all 3 regions, the fact that nearly half of Other employees are based in London means the median Other employee nationally is paid more than the median white employee nationally.

**In conclusion**

As an instruction guide for my spreadsheet, this is a definitely a long one! I had no choice but to take time to go through it in this detail because it is so important you grasp that ethnicity pay gap analysis is not the same as gender pay gap analysis. You will have to pay attention to sample sizes, think about what minimum sample sizes should be specified and become comfortable with confidence intervals if you insist on focusing on median hourly pay. Most of all, you will have to look at trends over time and learn how to interpret these. This is what the ONS has now decided to do with their ethnicity pay gap reports and I commend them for making these changes.

**— Would you like to comment on this article? —-**

Please do leave your comments on either of these **LinkedIn** or **X/Twitter** threads.

**— Subscribe to my newsletter to receive more articles like this one! —-**

If you would like to receive notifications from me of news, articles and offers relating to diversity & pay gaps, please **click here to go to my Newsletter Subscription page** and tick the Diversity category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.

**— Want to know more about pay gaps? —**

You will find a full list of my pay gap & diversity related articles **here which are grouped by theme**.

**— Can I help you to close your pay gap? —**

I offer the following services to my clients who want to define, measure, analyse, improve & control their pay gaps.

**Analysis**– I can dig deep into your data to identify the key drivers of your pay gaps. I can build a model using a large number of variables such as pay band, seniority, job function, location, etc and use this to identify the priority areas for closing your gaps.**Training**– I run training courses in basic statistics which are designed for non-statisticians such as people working in HR. The courses will show you how to perform the relevant calculations in Microsoft Excel, how to interpret what they mean for you and how to incorporate these in an action plan to close your gaps.**Expert Witness**– Has your gender pay gap data uncovered an issue resulting in legal action? Need an expert independent statistician who can testify whether the data supports or contradicts a claim of discrimination? I have experience of acting as an expert witness for either plaintiff or defendant and I know how to testify and explain complex data in simple language that can be easily understood by non-statisticians.

If you would like to have a no-obligation discussion about how I can help you, **please do contact me**.