With only 3 or 4 games to go for the teams of the Premier League, most of the season’s excitement has dissipated. Man City have wrapped up the title, the top 7 who will be playing in Europe next season is more or less settled and the former 10-team dogfight for relegation has resolved itself with a 4 point gap between the bottom 3 and the rest. Probably, the only remaining uncertainties are who will take 4th place (Spurs or Chelsea) and will Southampton escape relegation at the expense of Swansea?
I use a statistical approach known as Poisson Regression which I described in depth for round 29 of matches. If this is the first time you have seen my predictions then I strongly encourage you to click on that link to familiarise yourself with my terminology. My predictions all start with the latest form guide as shown below which I use to calculate two numbers for each team playing this weekend:
- eGS – the expected number of goals that a team will score.
- ePts – the expected number of points that a team will receive.
For round 36 this weekend, I have used my calculations of eGS & ePts (shown in the table below) to make 4 separate predictions of the scorelines for each match. The highlighted team in each match is the one with the higher eGS value but that doesn’t necessarily mean they will win.
The 4 scoreline prediction methods (ML, Med, Rdd & Int) work as follows:
- ML – Maximum Likelihood is the scoreline with the highest probability from the Scoreline Matrix as explained in step 3 of my post for round 29.
- Med – Median is derived from the median number of goals that each team is expected to score. See step 6 of my post for round 29 for a fuller explanation.
- Rdd – Rounded is a simpler predictor which just involves each teams eGS being rounded to the nearest whole number. So 1.8 for Chelsea rounds to 2 goals and 0.4 for Crystal Palace rounds to 0 goals, hence the 2-0 prediction.
- Int – Integer is simply the integer part of eGS which is equivalent to rounding down. Int is considered because whilst Rdd is better at predicting higher scorelines, it is very poor at predicting goalless teams whereas is more likely to predict this.
On reason I publish four predictions is that they are all plausible methods of converting both teams’ eGS into a scoreline and whilst as yet I am unable to say definitively which is the better option, it does appear rounds 29 to 32 that MED is edging ahead as shown in the table below. For each match prediction, I have scored them in one of three ways;
- Right Score i.e. I predicted the actual scoreline
- Right Result, Wrong Score i.e. I predicted the right outcome (win, draw or loss) but not the right score.
- Wrong Result i.e. I predicted the wrong outcome.
The table shows that Med is most accurate so far both in terms of the match outcomes (62% correct) and scorelines (16% correct). Is this a good prediction model? One way to evaluate is use a dumb model instead (see my post on what makes a good forecaster). Suppose 1 in 7 (14%) of matches result in a 2-1 win for the home team. If I predicted every match to be a 2-1 home win, I would be correct 14% of the time and the difference between this and my Med model would not be that great. I should say, I don’t know what the underlying distribution of scorelines is but I would like to find out. I can say that since round 29, 8 out of the 64 matches played have ended in 1-1 draw. If this was my dumb model, I would have 8 Right Scores, 8 Right Results and 48 wrong results. If my dumb model had been 2-1 home win, then I would have 7 right scores, 20 right results and 37 wrong results. So it does appear that the Med model is doing better than dumb models whilst the other 3 models may be doing better than dumb models.
To arrive at a final league table, I need to repeat this process for rounds 36 to 38 and then combine the predictions into a predicted final table. As I explained in round 29, I am making two separate predictions of the final table and I will demonstrate with my team Newcastle United by showing the predictions for all their remaining games.
My preferred method of estimating the final table is to total up the ePts values for all remaining games. Newcastle are currently on 41 points and if you total up the ePts, you find I am expecting them to get another 4.6 points which when rounded comes out at 46 points. Repeating this for all teams and you get the final table shown here which has Man City winning the league with a new record points tally, Newcastle in 10th and Southampton, Stoke & WBA relegated. After many weeks of a relegation dogfight involving up to 10 teams, Newcastle are now projected to be 13 points above Southampton and even when you take the margin of error into account (as shown by the LCI & UCI columns) it is clear they are safe.
However, with only 3 or 4 games to go, my preferred league table prediction will result in impossible point totals. For example, Man City have 90 points and with 4 games to go, it is impossible for them to get 101 points. So it is now time to ignore this prediction and focus on my second method of estimating the final league table. To do this, I use the 4 scoreline prediction methods described earlier. For each method, I work out the expected number of points given the predicted scores and then take an average across the 4 methods. This method tends to give more points to teams at the top and fewer points to teams at the bottom but it does result in an explicit prediction of the W-D-L record for each team. Again we see the same 3 teams being relegated, Man City winning the title with 102 points and Newcastle in 10th place this time with 45 instead of 46 points. The DIFF column shows the difference between the two predicted tables.