*This article was last updated using data submitted as of 4th September 2021.*

In February 2021, the EHRC confirmed that all employers with a headcount of 250 employees or more would have to submit their gender pay gaps based on the 2020 snapshot date. Due to the lateness of this confirmation, they also stated that no enforcement action would be taken prior to the end of September 2021. As result, fewer employers than expected have reported 2020 data so far so I have used two methods of imputation to estimate that the median gender pay gap among these employers narrowed by 0.2 to 0.3 pence in the pound in 2020.

Drawing conclusions when confronted with missing data is an important skill for a statistician to master. A variety of methods are available and I will use both a simple method (**Like for Like**) and a more complex method known as **Imputation** in this article. Hopefully I will end up with the same answer!

**Which employers should be included and excluded?**

My free spreadsheet of all reported gender pay data lists a total of 12,112 employers. Not all of them reported data in all years either because they were too small or because they went bust or because they have not yet reported for 2020 due to the 6 months period of grace. As of 4th September 2021, this table summarises the reporting situation. I have grouped employers by the years in which they submitted data and I will refer to each combination as **Year Groups**. As you can see there are 11 year groups of interest labelled A to K.

When estimating the trend between years, there is no point in including employers who reported for a single year only so the single-year group of 1,785 employers will be will be excluded from the analysis. In addition, I exclude all data points that look to be erroneous to me using the methods described in my post “*Year on Year Trends – The Good, Bad and Unilever*“. This leaves 10,174 employers whose data can be used to estimate the underlying trend.

So far 4,374 employers have reported 2020 data correctly (making up 7 year groups A to G) from which we can calculate trends. Another 5,800 employers (making up 4 year groups H to K) have yet to report 2020 data and until they do, I will use the two methods described here to estimate their 2020 trends. For information, these are the same methods I used to estimate the 2019 trend after pay gap reporting enforcement was suspended at the start of the COVID19 pandemic.

**Simple – Like for Like**

For each of the 11 year groups, I start by calculating the median of the median gender pay gaps reported by the employers for each of the available years. All pay gaps are presented as the difference between the median women’s hourly pay and the median man’s hourly pay assuming the median man earns £1. I can then calculate the year on year changes in these medians for each group e.g. for group A the change between 2019 & 2020 is +0.3p. Where a year group misses out a year, I interpolate for the missing year. For example, group C is missing data for 2019 and the change between 2018 & 2020 is -0.6p so the change between 2019 & 2020 and 2018 & 2019 for group D is assumed to be half this at -0.3p.

The chart below shows known data for 6 of the larger of the 11 year groups (A, B, D, E, H & K) and the estimated gender pay gaps for each year (purple bars) using simple like for like estimation. In 2020, **I estimate that the median woman earned 8.9p less than the median man which is a smaller gap than seen in 2019 (9.2p less), 2018 (9.5p less) and 2017 (9.7p less).**

The key to this method is that virtually all 10,174 employers reported their data in 2018. The median of the reported median gender pay gaps of these employers is the 9.5p less for women as shown in the chart which is an actual data point, not an estimated data point.

To estimate 2019, the actual trend is known for 9 year groups (A, B, C, D, F, G, H, I & J) from the above table accounting for 6,396 employers. I take a weighted average of these trends using the number of employers in the last column as the weight and this gives +0.3p. This is the known change between 2018 & 2019. I then make the assumption that the remaining 2 year groups (E & K) accounting for 3,778 employers who have not reported 2019 data will also experience the same trend. Hence my estimate that the 2019 gender pay gap is 9.2p less for women.

I then repeat the same process for 2020 where the actual change between 2019 & 2020 is known for 7 groups (A, B, C, D , E, F, G) accounting for 4,374 employers. Again I assume that the weighted average trend of these groups of +0.3p will be repeated for the remaining 4 year groups (H, I, J & K) accounting for 5,800 employees which results in my estimate that the 2020 gender pay gap is 8.9p less for women. Clearly the margin of error in this estimate must be larger than for 2019 since I am making estimates for a larger group of employers. As data starts to come in though, this margin of error will start to fall.

Finally I repeat the process for 2017 using 7 groups (A, C, D, G, H, J, K) accounting for 9,518 employers to get a weighted average of the change between 2017 & 2018 of +0.2p. This time I have to subtract this from the 2018 pay gap of -9.5p to get 9.7p less for women in 2017.

You may have noticed from the above table and chart that 3 year groups dominate the calculations at present. Group A has 3,208 employers who have reported correct data in all 4 years and the estimated trends I have calculated largely reflect this group. The other groups H & K that need to be estimated for 2020 do show similar trends for 2017 to 2019. So whilst I am making a number of assumptions here, the fact that the 3 largest groups are basically in agreement before 2020 gives me confidence that the simple Like for Like method is not far off the truth.

This type of analysis can be repeated for any sub-population of employers. I did this last year for 7 sectors defined by **Practical Law**, an online magazine about legal matters. If you would like me to do something similar for sectors of interest to you, please contact me to obtain a quote.

**Complex – Imputation**

The simple method of estimating trends makes one very large assumption. It is that all employers in the 4 groups yet to report 2020 follow the same trends as the groups who have reported 2020. That may be unwarranted since we already know that some year groups had different pay gaps in the first place. To check whether it is valid to make such assumptions, I have built a statistical imputation model to estimate the change in the median gender pay gap in 2020 vs 2019.

Once built, this model can be used to estimate the 2020 trend for each one of the employers in year groups H, I , J & K (yet to report 2020 data) based on known characteristics. My model was built using the 3,444 employers in group A & B only since we already know what the change was between 2018, 2019 & 2020 for these groups. The full list of factors I explored in my statistical model were:-

**Median gender pay gap in 2019****Change in median gender pay gap between 2018 & 2019**- Mean gender pay gap in 2019
- Change in mean gender pay gap between 2018 & 2019
- Gender balance in 2019 i.e. % of employees that are women
- Gender balance in 2019 in each income quarter
- Employer size.
- Date of submission in 2019
- Industry Sector

Not all are expected to be significant. In an ideal world, I would find that none of these factors are predictive of the what the change in median gender pay gap would be in 2020 and therefore I only need to use the Simple Like for Like method of estimating trend.

After building and testing a regression model with these 9 factors, I concluded that only the first two factors from the list above were significant, both statistically and practically, hence why they are highlighted in the list. The relationship between the change in gender pay gap between 2019 & 2020 and these two factors is shown in the chart below. The R-squared for this model was only 7% which means that 93% of the observed variance in gender pay gap trends in groups A & B in 2020 are due to other factors.

The dashed black line is the fitted relationship in the model and the solid pink line is a measure of whether or not a linear fit is appropriate (in this case it is). The nature of the fitted relationship is very interesting but is not a surprise to me given what I have said before about the role of chance in changes in gender pay gaps. As you can see, if an employer had no gender pay gap or a pay gap favouring women or the gender pay gap narrowed in 2019, then the negative correlation apparent here means that in 2020, we would expect the gender pay gap to widen in favour of men. Conversely if an employer has a large pay gap against women or had seen their pay gap widen against women in 2019, then on average their pay gap will narrow in favour of women in 2020. These results are similar to the imputation model I built last year for 2019 snapshot data.

This effect is entirely consistent with a well known phenomena in time series analysis called “*reversion to the mean*” or **Auto-Correlation**. It occurs when there is an underlying trend over time but random fluctuations around that trend occur so that the actual trend is sometimes higher and sometime lower than the expected trend. In the case of pay gaps, the random fluctuations mostly come from employee turnover. Suppose 10% of your employees leave the company every year and they are equally split between men & women, there is no guarantee that they will be replaced by an equal split of men and women. In some years, you might recruit more men than women, in others, you might recruit more women than men. If your recruitment process is genuinely non-discriminatory, then you would have to be unlucky to have a long run of years where you recruit more men than women. Instead, it is more likely to fluctuate between various scenarios. The whole process is not dissimilar to tossing a coin whereby a long run of tosses giving the same result would be unlikely. The end result is fluctuations in your gender pay gap due to chance alone.

Digression aside, if I apply my imputation model to all employers in groups H I J & K, and combine the results with the known trends in groups A to G, I arrive at **an imputed gender pay gap of 9.2 pence in the pound against women in 2019 and 9.0 pence in the pound against women in 2020**. The 95% confidence interval for these imputed values, assuming my imputation model is the correct one, is +/- 0.3p.

**Which is the better imputation method?**

The imputed gender pay gap of 9.0 pence in the pound against women for 2020 using complex imputation is almost the same as the 8.9 pence in the pound I estimated using simple like for like methods. Both methods though conclude that the gender pay gap narrowed in 2020 by 0.2 to 0.3p. Next month is the deadline for reporting 2020 data and so I expect a considerable update then.

Ultimately, I am most interested in seeing that there is a trend by which the pay gap against women is narrowing and since both methods give the same answer, I am not too bothered about choosing between them. The simple like for like method is easier to explain so I am happy to mostly use this estimate. However, the complex imputation method does give some insight as to how pay gaps can change year on year, so this estimate should be born in mind as well.

**— Would you like to comment on this article? —-**

Please do leave your comments on either of these **LinkedIn** or **Twitter **threads.

**— Subscribe to my newsletter to receive more articles like this one! —-**

If you would like to receive notifications from me of news, articles and offers relating to diversity & pay gaps, please **click here to go to my Newsletter Subscription page** and tick the Diversity category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.

**— Want to know more about pay gaps? —**

You will find a full list of my pay gap & diversity related articles **here which are grouped by theme**.

**— Need help with understanding your pay gap? —**

I offer the following services. Please click on the headings for more details.

**Analysis**– I can dig deep into your data to identify the key drivers of your pay gaps. I can build a model using a large number of variables such as pay band, seniority, job function, location, etc and use this to identify the priority areas for closing your gaps.**Training**– I run training courses in basic statistics which are designed for non-statisticians such as people working in HR. The courses will show you how to perform the relevant calculations in Microsoft Excel, how to interpret what they mean for you and how to incorporate these in an action plan to close your gaps.**Expert Witness**– Has your gender pay gap data uncovered an issue resulting in legal action? Need an expert independent statistician who can testify whether the data supports or contradicts a claim of discrimination? I have experience of acting as an expert witness for either plaintiff or defendant and I know how to testify and explain complex data in simple language that can be easily understood by non-statisticians.

If you would like to have a no-obligation discussion about how I can help you, **please do contact me**.