The first round of gender pay gap data is now available for all organisations employing more than 250 employees. The UK now faces the challenge of interpreting this data and using it properly which will not be easy given “*the 7 ways to misuse gender pay gap data*” I recently wrote about. In the short term, there is an immediate need to resolve errors in the data published and I will demonstrate that potentially 10-15% of organisations have entered incorrect data into the government’s database.

I have downloaded the gender pay gap data for all 10,504 organisations who have submitted their results. My analysis of this data has uncovered 3 distinct type of errors.

**Incorrect data entered****Income quartiles entered the wrong way around****Claiming no gender pay gap when income quartile gender splits clearly show there is a gap**

I will explore each in turn and explain how the government should have prevented these errors through simple sanity check code on their website.

**Error 1 – Incorrect data entry**

The following two companies say their gender pay gap is greater than 100%. This is mathematically impossible.

In statement 1 of my “*7 ways to misuse gender pay gap data*” post, I explain that the median gender pay gap is calculated as the difference between the median man and the median woman in terms of their hourly earnings (including bonuses) expressed as a percentage of the median man’s figure. So if the median man in a company is paid £25 per hour and the median woman is paid £20 per hour, then the difference is £25-£20=£5 which is 20% of the median man’s figure of £25. The convention (which I disagree with) is that if men are paid more than woman then the gender pay gap is positive and if woman are paid more than men, it is negative.

In the two examples above, gender pay gaps of 121% and 320% were recorded. Since the convention is that a positive number means that median man earns more than the median woman, this means the median woman is earning 121% LESS than the median man in the first chart and 320% less in the second chart. But the maximum positive value the median gender pay gap can be has to be less than 100% i.e. a situation where woman are paid nothing which obviously cannot happen. So what has gone wrong here?

All companies have the option of uploading their own report to the government database. Partnering Health Ltd chose not to do so but Shrewsbury Academies Trust did so and they have reported the same number. At this point in time, I cannot work out what error they made so they need to look at this again. It is possible they have made a variant on Error 3 described below.

Obviously it is a simple matter for the government to prevent such errors occurring. All that is needed a piece of code that rejects any pay gap (whether median or mean) that exceeds 100%. One possible reason why people might make this error is that instead of expressing the difference between the median man and the median woman as a percentage of the median man, they instead express it as a percentage of the median woman. Then if the difference between the median man & woman is £20 per hour and the median woman earns £10 per hour, then the difference is 200%.

When the median woman earns more than the median man, then it is possible for the gender pay gap to exceed 100% but in the negative direction according to the convention. An example is Randstad HR as shown. I found this company by looking at who was paying women the most relative to men and at first I was curious why Randstad should be so favourable to women. Then I noticed a variant of Error 3 below had occurred and clicked on their own analysis of their gender pay gap. If you read this you will see that they are in fact reporting a gender pay gap of 18% not -104% as claimed by the chart. I am not sure how this error has been made but again any claim to pay woman twice as much as men should be challenged by the government’s website if someone enters such data. Randstad in fact are not the largest such claimant. That distinction goes to Bar 2010 Ltd though in that instance, the difference is understandable due to the fact the company is practically all male.

How many companies are making errors like these? The answer is I don’t know but it could be mitigated with better sanity checks on the government website.

**Error 2 – Income Quartiles entered the wrong way around**

In statement 3 of my “*7 ways to misuse gender pay gap data*” post, I explain the idea of income quartiles whereby the organisation is split into 4 equal sized chunks (known as quartiles) of employees based on their hourly earnings (including bonuses) and then within each quartile, the gender split between men and women is recorded. I stated in my other post that it is the quartile data that potentially offers the greatest value to understanding why women are paid less than men overall but it also offers a straightforward sanity check on the median gender pay gap.

I will use Worcestershire County Council as my example here. Their median gender pay gap is completely in line with the national median of 9%. Yet based on the quartile data submitted, it is possible they have made a mistake and should have submitted -9% i.e. woman paid more than men. A little thought as I will explain shows why.

Suppose exactly 400 people work for the council. When these people are allocated to income quartiles, each quartile will have exactly 100 people. Then within the lowest income quartile, according to the chart, 66 are woman and 34 are men. Conversely, in the upper quartile, 75 are woman and 25 are men. If you add up the number of woman and men in the council as implied by the chart, you will see that there are 293 women and 107 men working for the council.

To find the median woman, the 293 women have to stand in a line in order of their hourly earnings. The 147th woman will be in the middle of the line and becomes the median woman. We repeat with the 107 men and 54th man in the line becomes the median man.

Which income quartiles do the median man and median woman come from? We know 138 (=72+66) women are in the bottom two income quartiles but we are looking for the 147th woman. Therefore she must be in the upper middle income quartile. For the median man, the bottom two income quartiles contain 62 (=34+28) men and we are looking for the 54th man so he must be in the lower middle income quartile. Therefore the median man is earning less than the median woman and the reported gender pay gap should be negative not positive as shown in the chart.

Something is clearly not right here. Either the median gender pay gap is incorrect or more likely, the gender splits by income quartile have been entered in the government database the wrong way round. According to the chart, 75-80% of the higher paid roles (presumably managers and directors) are women whereas 65-70% of lower paid roles (presumably front line staff) are women. It is a truism that senior roles are more likely to be men and even though this is a public sector organisation which tends to be more aware of gender gaps than the private sector, I suspect this data has been entered the wrong way around in this instance. A week ago, when I was working with an earlier version of the database, I noticed that Arriva Plc had made this mistake since the database figures were the reverse of what was published in their own report but they spotted this themselves and the latest version of the database shows the correct figures.

Such errors can be captured by coding the thought process I describe above and prompting the person who is entering the data to check their data if this situation occurs. It is not difficult to work out that if a company is reporting a POSITIVE median (not mean) gender pay gap then this must imply that the sum of male %’s in the upper income & upper middle income quartiles should be HIGHER than the sum of the male %’s in the lower middle and lower income quartiles. So in the case of Worcs CC, since they claim a gender pay gap of +9% then the inequality 25%+20% > 28%+34% must be true but it isn’t in this case. Conversely if an organisation claims a negative gender pay gap of -13% such as Eastleigh Borough Council, then the inequality of 75%+47% < 45% + 56% should be true but it clearly isn’t in this case. Following up through Eastleigh’s own report shows that the reason for this error is that they do not understand what a quartile is and therefore their calculations are completely wrong.

**Error 3 – Claim of no pay gap conflicts with the male quartile gap**

The sanity check I explain in error 2 can be coded as follows.

- Calculate the
**Male Quartile Gap**as the sum of the male %’s in the upper & upper middle income quartiles minus the sum of the male %s in the lower and lower middle income quartiles. - If gender pay gap is positive then the male quartile gap must be positive.
- If gender pay gap is negative then the male quartile gap must be negative.

If these sanity checks are violated the data published is wrong and an error has been made somewhere. But what about companies that claim a median gender pay gap of zero? 937 of the 10,504 organisations claim the median man is paid the same as the median woman. Based on the sanity check, it should be obvious that this can only be correct if the male quartile gap is zero and only 48 of the 937 organisations meet this criteria.

Actually, it is mathematically possible for a small violation of this sanity check to occur and still have a legitimate median gender pay gap of zero. Chelsea football club is an example where the male quartile gap is 1% even though their median gender pay gap is 0%. Of the 889 organisations reporting zero median gender pay gaps but non-zero male quartile gaps, 325 have male quartile gaps between -5% and +5%. Some of these 325 will be wrong but some could be right. But this still leaves 564 organisations which must have made a mistake somewhere and a number of football clubs, including my own club of Newcastle Utd, appear to be making error 3 as shown below.

In addition to these three clubs, I saw the same error with Millwall, Notts County, Wigan, Rangers, Hearts and Leicester City.

Why has this happened? As we saw for Eastleigh Borough Council, they might simply not know how to do the calculations properly. I therefore decided to read the government guidance on how to calculate the median (and the quartiles) and I was shocked by what I saw. Here I quote the calculation of the median in red italics:-

*Median gender pay gap in hourly pay: how to calculate*

*Arrange the hourly pay rates of all male full-pay relevant employees from highest to lowest**Find the hourly pay rate that is in the middle of the range – this gives you the median hourly rate of pay for men**Arrange the hourly pay rates of all female full-pay relevant employees from highest to lowest**Find the hourly pay rate that is in the middle of the range – this gives you the median hourly rate of pay for women**Subtract the median hourly pay rate for women from the median hourly pay rate for men**Divide the result by the median hourly pay rate for men**Multiply the result by 100 – this gives you the median gender pay gap in hourly pay as a percentage of mens’ pay*

Steps 1, 3, 5, 6 & 7 are OK but steps 2 & 4 are plain wrong. A non-mathematical person will interpret this as meaning the middle of a numerical range. So if hourly earnings of woman vary between £10 & £90 per hour, the middle of the range will be £50 per hour. But there is absolutely no guarantee that this will be the earnings of the median woman who is the in the middle of the LINE not the middle of the range! I run basic statistical training courses and this misconception of the median comes up all the time so I am certain that many organisations have read these instructions literally.

By the way, the ACAS guide is no better and makes the same mistake regarding the median.

The information on calculating quartiles starts off OK but then gets into a muddle when dealing with numbers that don’t divide evenly into 4 whole numbers. However, for many people who think they know that the median is the middle of the range rather than the line, I would not be at all surprised if they then think that quartiles is just the range divided into 4 equal sized chunks e.g. £10-30, £30-50, £50-70 & £70-90 which of course it isn’t.

**How many organisations are making errors?**

Based solely on the comparison of the reported median gender pay gap versus the calculated male quartile gap, I estimate that 9% of organisations have definitely made an error and a further 8% may have made an error as shown in the table. Note the sanity check of comparing male quartile gaps with reported gender gaps only captures errors 2 & 3. Error type 1 can still occur and not create a conflict between the reported median gender pay gap and the calculated male quartile gap so the true error rate may yet be higher. I urge the government to revamp its database and include sanity checks as described in this post as soon as possible. At the same time, their guidance on calculating the median and quartiles must be rewritten by a statistician!