On 24th March 2020, the UK government suspended enforcement of the gender pay gap reporting deadline of 5th April. As of today, just over 50% of employers have reported their 2019 gender pay gap figures. Despite this shortfall, I have used statistical imputation methods to calculate that the median gender pay gap narrowed in 2019 from 9.6 pence in the pound to 9.0-9.2 pence in the pound.
Drawing conclusions when confronted with missing data is an important skill for a statistician to master. A variety of methods are available and I will use both a simple method (Like for Like) and a more complex method known as Imputation in this article. Hopefully I will end up with the same answer!
Which employers should be included and excluded?
My free spreadsheet of all reported gender pay data lists a total of 11,728 employers. Not all of them reported data in all 3 years either because they were too small or because they went bust or because they have not yet reported for 2019 due to the coronavirus outbreak. As of 5th June 2020, this was the situation,
- 4,870 employers have reported data for all 3 years 2017, 2018 & 2019
- 388 employers have reported data for 2018 & 2019 only
- 5,043 employers have reported data for 2017 & 2018 only
- 1,453 employers have reported data for only a single year
When estimating the trend between two years, there is no point in including employers who reported for a single year only so the last group will be excluded from the analysis. Our analysis will focus on the other 3 groups of employers. We already know what the trend in 2019 is for groups 1 & 2 so the imputation methods discussed in this article is about imputing the trend for group 3.
When I looked at trends last year between 2017 & 2018, I made a point of excluding employers whose data looked to be incorrect. I have done the same thing again for this year using the criteria I first explained in my post “Year on Year Trends – The Good, Bad and Unilever“. This reduces the total number of employers to be included from groups 1 to 3 above from 10,301 to 9,532, a reduction of 7.5%.
Simple – Like for Like
The chart below shows both the 3 data groups used (lines with brown markers) and the estimated gender pay gaps for each year (purple bars) using simple like for like estimation. The gender pay gaps are presented as the difference between the median women’s hourly earnings and the median man’s hourly earnings assuming the median man earns £1. In 2019, I estimate that the median woman earned 9.0p less than the median man which is a smaller gap than seen in 2018 (9.6p less) and 2017 (9.9p less).
The key to this method is that all 9,532 employers used reported their data in 2018. The median of the reported median gender pay gaps of these employers was the 9.6p less for women as shown in the chart so this is an actual data point, not an estimated data point.
For 2019, group 1 (data in all 3 years) reported their median gender pay gap had fallen from 11.1p less for women to 10.6p less for women, a reduction of 0.5p in the pound. For group 2 (data in 2018 & 2019 only), they reported their median gender pay gap had fallen from 13.9p less for women to 13.2p less for women, a reduction of 0.7p. Group 2 is a lot smaller than group 1, so when I take a weighted average of the reductions between 2018 & 2019, I find that it is 0.6p in the pound.
I then assume that group 3 (who have not yet reported 2019 data) follow the same trend and that their median gender pay gap fell by 0.6 p in the pound. Now, since I already know the median of all 3 groups in 2018 was 9.6p less for women, subtracting 0.6p from this gives my estimated median gender pay gap for 2019 of 9.0p less for women assuming the median man earns £1.
I then repeat the 2019 estimation process for 2017 but this time using groups 1 & 3. For group 1, the median gender pay gap was 0.4p higher in 2017 and for group 3 it was 0.3p higher. Again, a weighted average of these two figures gives 0.3p higher which when applied to the known 2018 median gender pay gap of 9.6p less for women results in a figure of 9.9p less for women in 2017.
By now, you should have noticed two things about the chart. First, groups 1 & 3 more or less agree with each other regarding the change between 2017 & 2018 and groups 1 & 2 similarly agree with each other on the change between 2018 & 2019. Second, the 3 groups are at different levels with group 2 having large pay gaps, group 3 having small pay gaps and group 1 in between. Why this should be is not known but this explains why taking the median of all employers in each year separately doesn’t work. If I had done that, then the medians for the 3 years would have been respectively 9.2p less, 9.6p less and 10.8p less. This incorrectly suggests a widening pay gap not a narrowing pay gap as shown by the chart.
This analysis can be repeated for any sub-population of employers. I have already done this for 7 sectors defined by Practical Law, an online magazine about legal matters. If you would like me to do something similar for sectors of interest to you, please contact me to obtain a quote.
Complex – Imputation
The simple method of estimating trends makes one very large assumption. It is that all employers in group 3 (yet to report 2019) are similar to the employers in groups 1 & 2. That may be unwarranted since we already know that group 3 employers had smaller pay gaps in the first place. The two charts below show my estimated trends for small employers (those with less than 500 employees) and large employers (those with 500 or more employees) and it is clear that these are different.
Whilst both small and large employers in group 1 saw their pay gaps narrow by 0.6p in the pound, group 2 employers behave differently. At the same time, group 3 employers behaved differently between 2017 & 2018. Consequently, it is not clear if we can assume that all employers in group 3 are similar to group 1 & 2.
Employer size is only one factor though. What about other factors such as industry sector, gender ratio, previous year’s pay gap, bonuses paid, etc? This is what motivates the more complex approach of imputation.
An imputation model attempts to estimate the change in the median gender pay gap for each one of the 4,751 employers in group 3 based on known characteristics. A statistical model is built using the 4,432 employers in group 1 only since we already know what the change was between 2018 & 2019 for this group. The full list of factors I explored in my statistical model were:-
- Median gender pay gap in 2018
- Change in median gender pay gap between 2017 & 2018 – hence why I can’t use group 2 data.
- Mean gender pay gap in 2018
- Change in mean gender pay gap between 2017 & 2018
- Gender balance in 2018 i.e. % of employees that are women.
- Gender balance in 2018 in each income quarter.
- Employer size.
- Date of submission in 2018
- Industry Sector
Not all are expected to be significant. In an ideal world, I would find that none of these factors are predictive of the what the change in median gender pay gap would be in 2019 and therefore I only need to use the Simple Like for Like method of estimating trend.
After building and testing a regression model with these 9 factors, I concluded that only the first two factors from the list above were significant, both statistically and practically, hence why they are highlighted in the list. The relationship between the change in gender pay gap between 2018 & 2019 and these two factors is shown in the chart below. The R-squared for this model was only 10% which means that 90% of the observed variance in gender pay gap trends in group 1 in 2019 are due to other factors.
The dashed black line is the fitted relationship in the model and the solid pink line is a measure of whether or not a linear fit is appropriate (in this case it is). The nature of the fitted relationship is very interesting but is not a surprise to me given what I have said before about the role of chance in changes in gender pay gaps. As you can see, if an employer had no gender pay gap or a pay gap favouring women or the gender pay gap narrowed in 2018, then the negative correlation apparent here means that in 2019, we would expect the gender pay gap to widen in favour of men. Conversely if an employer has a large pay gap against women or had seen their pay gap widen against women in 2018, then on average their pay gap will narrow in favour of women in 2019.
This effect is entirely consistent with a well known phenomena in time series called “reversion to the mean” or Auto-Correlation. It occurs when there is an underlying trend over time but random fluctuations around that trend occur so that the actual trend is sometimes higher and sometime lower than the expected trend. In the case of pay gaps, the random fluctuations mostly come from employee turnover. Suppose 10% of your employees leave the company every year and they are equally split between men & women, there is no guarantee that they will be replaced by an equal split of men and women. In some years, you might recruit more men than women, in others, you might recruit more women than men. If your recruitment process is genuinely non-discriminatory, then you would have to be unlucky to have a long run of years where you recruit more men than women. Instead, it is more likely to fluctuate between various scenarios. The whole process is not dissimilar to tossing a coin whereby a long run of tosses giving the same result would be unlikely. The end result is fluctuations in your gender pay gap due to chance alone.
Digression aside, if I apply my imputation model to all employers in group 3, I find that the median gender pay gap in group 3 narrowed by 0.3 pence in the pound in favour of women. This is only half of the actual trend observed in groups 1 & 2 in 2019. When I combine the three groups, I arrive at an imputed gender pay gap of 9.2 pence in the pound against women in 2019 which is 0.4 pence smaller than in 2018. The 95% confidence interval for this imputed value, assuming my imputation model is the correct one, is +/- 0.2%.
Which is the better imputation method?
The imputed gender pay gap of 9.2 pence in the pound against women for 2019 using complex imputation is higher than the 9.0 pence in the pound I imputed using simple like for like methods. Both methods though conclude that the gender pay gap really did narrow in 2019.
Ultimately, I am most interested in seeing that there is a trend by which the pay gap against women is narrowing and since both methods give the same answer, I am not too bothered about choosing between them. The simple like for like method is easier to explain so I am happy to mostly use this estimate. However, the complex imputation method does give some insight as to how pay gaps can change year on year, so this estimate should be born in mind as well.
— Need help with understanding your pay gap? —
I offer the following services.
- Analysis – I can dig deep into your data to identify the key drivers of your pay gaps. I can build a model using a large number of variables such as pay band, seniority, job function, location, etc and use this to identify the priority areas for closing your gaps.
- Training – I run training courses in basic statistics which are designed for non-statisticians such as people working in HR. The courses will show you how to perform the relevant calculations in Microsoft Excel, how to interpret what they mean for you and how to incorporate these in an action plan to close your gaps.
- Expert Witness – Has your gender pay gap data uncovered an issue resulting in legal action? Need an expert independent statistician who can testify whether the data supports or contradicts a claim of discrimination? I have experience of acting as an expert witness for either plaintiff or defendant and I know how to testify and explain complex data in simple language that can be easily understood by non-statisticians.
If you would like to have a no-obligation discussion about how I can help you, please do contact me.
— Want to know more about pay gaps? —
I have written a number of articles about pay gaps. You can find the full list of my articles grouped by theme here.
I also comment on pay gaps on my Twitter thread. Some notable tweets are here.
- My complaint about comments made by the head of the TUC on the 2018 pay gap.
- Some observations on the government’s guidance to producing gender pay gap statistics and the numerous deficiencies in these.
- My comments on why incorrect gender pay gap data is being submitted.
- At last, the BBC publishes a good article on gender pay gaps!
— Subscribe to my newsletter to receive more articles like this one! —-
If you would like to receive notifications from me of news, articles and offers relating to diversity & pay gaps, please click here to go to my Newsletter Subscription page and tick the Diversity category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.