With 10 games to go, the 2018 EPL is now entering its final quarter and supporters of all teams are starting to wonder where their team will finish in the league. As a Newcastle United supporter, my team is stuck right in the middle of one of the tightest relegation battles in living memory. At the other end of the table, my wife’s team Spurs are almost certainly out of the running for the title but Champions League qualification is definitely in their sights. To set expectations, I have used a statistical method known as Poisson modelling to predict the final league table come May 13th and I will update this post after every round of games between now and then so please bookmark this page.
Poisson modelling is a very common statistical method used to predict football matches and nearly all forecasters use this approach. It is not difficult to explain the basic concept which I do in 7 steps and I will use the Watford V WBA game as an example which is currently being played as I write this post (Saturday 3rd March 2018). If you wish to skip to my predicted league table then go to step 5.
Step 1: Calculate the Expected Number of Goals (eGS) for each team
The process starts with an estimate of the expected number of goals (eGS) that each team will score and I am expecting Watford to score 1.53 goals and WBA to score 0.8 goals. How did I arrive at these numbers? I start with the league table as of now and look at the number of goals each team has scored and conceded for home and away games separately.
I can then calculate GS & GC which are the average number of goals scored/conceded for each team home and away. Man City are the most prolific with 2.3 goals scored per game both home and away. Newcastle United on the other hand are only scoring 0.9 goals per home game which is one of the lowest in the league. Their defence at home though is sound with a GC of 1.1 which is consistent with the form of the top teams.
It is easier to understand the Poisson modelling process by calculating rGS & rGC instead. Across all 20 teams, home teams are scoring 1.54 goals and away teams are scoring 1.16 goals. So if I now divide each team’s GS & GC by these averages, I end up with the home and away rGS & rGC for each team. For Newcastle, their rGC is 0.9 which says their defence is conceding 90% of the league average showing that it is an above average defence at home.
To arrive at the eGS values for each team, all I have to do is multiply the league average GS of 1.54 by the team’s rGS AND their opponents rGC. What I am doing here is combining the effectiveness of a team’s attack with the effectiveness of their opponents defence. Let’s look at Watford who are playing at home with WBA as the away team. Watford’s rGS is 1.0 whilst WBA’s rGC is also 1.0 so 1.54*1.0*1.0 gives Watford’s eGS of 1.53 (due to rounding) as shown in the chart in step 2. For WBA, their eGS is 1.16 times their away rGS of 0.4 times Watford’s home rGC of 1.6 which results an eGS of 0.8 goals.
Nearly every forecaster does the same thing as I have explained here. Differences arise in the calculation of each team’s GS & GC. I have chosen to use the entire season but some forecasters prefer to give more weight to the most recent matches. Some split GS & GC into set pieces and open play goals and predict each separately. Others try to bring in additional information such as whether a team missing a key player. Apparently Newcastle’s defence this season is entirely due to their captain Jamaal Lascelles. When he has been suspended or injured, the defence has been a sieve but when he is playing, it is a rock.
Step 2: Calculate the Poisson Probabilities for each team
Of course in reality, no team can score a fraction of a goal, the game is decided by a whole number of goals scored by each side. eGS is a statistical measure that is needed to exploit the properties of something called the Poisson distribution. This is used to calculate the probability of a certain number of events occurring given an expected value, in this case number of goals scored. In the chart, you can see two Poisson probability distributions, a green one for Watford and a brown one for WBA. According to this, there is a 45% chance of WBA not scoring given their eGS of 0.8 whilst there is only a 22% chance of Watford going goalless since their eGS is 1.53. Both teams have roughly the same chance (1 in 3) of scoring only 1 goal whilst Watford are much more likely than WBA to score 2 or more goals.
Step 3: Calculate the Scoreline Matrix
Once you have the Poisson probability distributions for each team, the next step is to create a matrix of all possible scorelines for the match and calculate the probability of each scoreline. In the table shown, the rows are the home team (Watford) and the columns are the away team (WBA). Green outcomes are wins for the home team, those in brown are wins for the away team and those in white are draws. The probabilities are calculated simply by multiplying the two respective Poisson probabilities. For example, the highest probability of 15% is for a 1-0 win for Watford. This was calculated by multiplying the probability of Watford scoring 1 goal (33%) by the probability of WBA scoring 0 goals (45%) and 0.33*0.45 is 0.15 or 15%.
By now, you should have worked out that this matrix, once calculated, can be used to calculate the probability of each outcome for the match i.e. win for Watford, win for WBA or a draw. By summing the green (& black) cells, I find that Watford have a 55% chance of winning the game. Summing the brown cells tells us that WBA have a 20% chance of winning which leaves a 26% chance of a draw. In fact, I can also go one step further and calculate the Expected Number of Points (ePts) for each team. A win is worth 3 points and draw 1 point, so Watford’s ePts is 3*0.55 + 1*0.26 = 1.9 points and WBA’s ePts is 3*0.2 + 1*0.26 = 0.84 points.
Step 4: Calculate Expected Number of Points (ePts) for each team
Obviously it is not possible to win 1.9 points but if you work out each team’s ePts for each of its remaining 10 games, you can sum the ePts and add this to a team’s existing points total to get a prediction of how they will do for the whole season. I have done this for my team Newcastle United with ePts shown in the blue columns and eGS in the gold columns. For each match, I have highlighted the team with the higher ePts (again green for home team, brown for away team) which shows that Newcastle are expected to be the stronger team in 3 matches and the weaker team in 7 matches which is not good news for a relegation dogfight. On this reckoning, Newcastle United who currently have 29 points are expected to gain a further 10.9 points and therefore would end the season with 40 points. This should be enough to avoid relegation as I explained in my post at the start of the season, which looked at how many points were needed to achieve differing outcomes, and in that I showed teams can expect to be safe most of the time with 36 points.
Step 5: Predict League Table for the end of the season based on ePts
If I repeat this process for each team, I get my prediction of the final league table which shows Newcastle United ending the season in 14th place. The numbers in yellow is the league table as of 2nd March 2018 so this means Newcastle should gain one more place. It also shows that Swansea, Stoke & WBA who are already in the dropzone, will still be there come the end of the season. At the top of the table, no change is expected so this means my wife’s team Spurs should qualify for the Champions league. It is worth noting that my ePts model is predicting that Man City will win the league with 100 points which would be a new record for the EPL.
Of course, no prediction is perfect and one of the nice features of the Poisson model is that you can calculate the margin of error in an ePts prediction. These can then be combined and the result will be a set of confidence intervals for the final league table which is shown by the LCI & UCI columns in light green. For Newcastle, this shows a range from 33 to 47 points. 47 points would be enough for a top 10 finish but 33 points would almost certainly result in relegation so they can’t be complacent. On the other hand, Watford & the teams above them are probably safe from relegation so the relegation battle is between 8 teams. I exclude WBA since the best they can expect is 35 points which is not enough to avoid relegation.
Step 6: Predict scorelines for each match
The next table shows ePts & eGS values for all matches in round 29 being played this weekend. Whilst the stronger teams in each match are highlighted, it is a natural question to ask if Poisson modeling can be used to predict the scoreline. The answer is that it can, but there are many possible methods and there is no obviously superior method. I have used 4 different methods (ML, Med, Rdd & Int) and in some cases they all predict the same result e.g. Man City to beat Chelsea 2-1 and in other instances they give quite different results such as Watford V WBA which is the example of this post. How do they work?
- ML – Maximum Likelihood is the scoreline with the highest probability from the Scoreline Matrix calculated in step 3.
- Med – Median is derived from the median number of goals that each team is expected to score. In step 2, the Poisson probabilities were calculated for each possible outcome (0, 1, 2, etc goals) and the median outcome is the one where the sum of the probabilities is equal to or greater than 50%. So for Watford 22% (for 0 goals) + 33% (for 1 goal) is greater than 50% so 1 is the median and for WBA, 45% (for 0 goals) and 36% (for 1 goal) is also greater than 50% so the median is also 1.
- Rdd – Rounded is a simpler predictor which just involves each teams eGS being rounded to the nearest whole number. So 1.53 for Watford rounds to 2 goals and 0.8 for WBA rounds to 1 goal.
- Int – Integer is simply the integer part of eGS which is equivalent to rounding down. Int is considered because whilst Rdd is better at predicting higher scorelines, it is very poor at predicting goalless teams whereas is more likely to predict this.
As I publish this post, Watford have just beaten WBA 1-0 so we can chalk up a success for ML & Int, a partial success for Rdd since Watford did win and a fail for Med.
Step 7: Predict League Table for end of season using expected scorelines
The accuracy of expected scorelines is low, at best the scoreline is correctly predicted in 1 in 7 matches. They do however provide a basis for predicting actual number of points for each match rather than ePts. So my second predicted league table is an average of the 4 predicted league tables using each of the 4 scoreline models. At this point, I do not consider this league table to be my main prediction since I regard the ePts table from step 5 to be superior. However, when we get down to the last 2 or 3 weeks of the season, ePts will be predicting impossible point values in some cases and at that time, this expected scoreline table will be the better predictor.
The table shows the expected W/D/L breakdown for each team and thus the final points tally and how this differs from the ePts table. In general, this table will overestimate the points tally for teams at the top and underestimate for teams at the bottom. For now, this table acts as a sense check against the ePts table. For the top 4 teams, the prediction is the same. For Newcastle Utd, this suggests they will end up in 13th rather than 14th but with 1 point fewer. The biggest difference is the relegation battle since this predicts that Crystal Palace rather than Swansea will be relegated. As I write this, Swansea are beating West Ham 4-1 whereas my expected scoreline was 1-1 across the board so this means Swansea are already doing better than expected.