• Blog home
  • Elections
  • Polling
  • Sport
  • Weather
  • Forecasting
  • Diversity
  • Stats Training
  • Misc
  • Archive
  • Twitter
  • Back to Marriott Statistical Consulting

Nigel Marriott's Blog

An independent statistician using data to understand our world and to predict the future

You are here: Home / Forecasting / Rugby World Cup #3 – Who will win in 2019? – Model Evaluation

Rugby World Cup #3 – Who will win in 2019? – Model Evaluation

October 30, 2019 By Nigel

After 43 matches with 37 correctly predicted, the stage is set for an epic final between England & South Africa to settle the 2019 Rugby World Cup (Men’s).  Ahead of making a prediction for that match, I have examined my model in depth and in this post I explore whether or not the model needs to be adjusted.

If you’ve read my earlier posts, then you will know that I have used two models for predicting match outcomes:

  • HIGHRANK – whichever team has the higher rank is expected to win the match.
  • EXPWIN – a linear regression model which uses the difference in ranking points to predict the margin of victory and the probability of winning for the higher ranked team.

For 41 matches, both models have given the same prediction.  The two matches where they disagreed were the two South Africa games against Japan & Wales in the knockout rounds with each model having one correct & one incorrect prediction.  I will evaluate each model separately but I have been using EXPWIN as my main model due to its greater flexibility so I will spend most of the time on that.

How has HIGHRANK performed?

The HIGHRANK performance is summarised in the table below and clearly show it has performed in line with history.  Whilst it might look like the closest games [0,3) have performed worse than history, the smaller sample size for the world cup means such differences can be expected.  Indeed if I combine [0,3) & [3,7) then history shows 32 correct predictions (or 71%) out of 45 matches compared to 12 correct predictions (or 75%) out of 16 world cup matches which is no difference at all.

How has EXPWIN performed?

Let’s remind ourselves of the EXPWIN model formula which was built using linear regression using the entire dataset shown in the chart.

Expected Score Gap = 1.75 * Ranking Points Gap – 1.58

The Residual Standard Error is 13.7 points or 2 converted tries and the R-squared is 0.33.  There is no evidence that this model differed between the 5 datasets shown in the chart (3 6 Nations & 2 Autumn Internationals).  The errors are consistent with a normal distribution which means that EXPWIN can also be used to calculate the probability of the stronger ranked team winning or P(WIN).  This is the probability that the Score Gap is greater than zero assuming a normal distribution where the mean is the Expected Score Gap using the above formula and the standard deviation is 13.7.

The next chart shows the same scatter plot for the 43 world cup matches so far (pool stages are solid circles, knockouts are hollow diamonds) along with the EXPWIN model as a solid dashed black line.  Despite the lack of evidence that there was a difference between the 6 Nations & Autumn Internationals, I decided it was still worth showing separate thinner dashed black lines for the EXPWIN model had I based this solely on 6 Nations data or Autumn International data.  This is one way of showing how much parameter error (error in the model coefficients) might be expected to occur in the world cup since no model is ever perfect.

The solid red line is the same model formulation as EXPWIN but this time just using the world cup data alone.  Lets call this EXPWINWC and the formula for this line is as follows:

Expected Score Gap = 1.99 * Ranking Points Gap + 0.79

The question we have to ask ourselves is does this constitute evidence that the EXPWIN model was wrong?  One can answer this using multivariate linear regression to determine if the difference in the model coefficients are statistically significant but I still like to look at the question visually.  What I see in this scatter plot is that the slope is more or less the same as EXPWIN especially when we focus on close matches with ranking point gaps of less than 7 points.  This is the scenario we have for the final where the ranking point gap is only 1.6 points in favour of England.

The issue is the model intercept.  It would appear that in this world cup, the stronger teams have been doing slightly better than expected.  One way to explore this question is to plot the distribution of the errors and compare this with the expected distribution of errors assuming a normal distribution with mean 0 and standard deviation 13.7.  This is what is shown in the chart here.  I see two things here.  First the spread of errors is consistent with expectations.  Indeed the standard deviation of the observed errors is 15.0 points which is similar to the expected 13.7 points.  Second the observed errors are skewed towards the right instead of being symmetrical thus indicating that model errors have tended to favour the stronger team.

The 95% confidence interval for the mean error of these 43 matches is +0.4 to +9.6 points and the mean error is +5.0 points.  In other words, if we decide that this is evidence that the model is underestimating the performance of stronger teams then England expected margin of victory should be 5 points higher than the current expected margin of only 1 point using EXPWIN. Of course given that the residual standard error is 14 points this is no great change in the grand scheme of things but for a bookie seeking to price their odds properly, such a small change in the expected margin of victory could be quite significant.  P(WIN) changes from 53% to 67% or in betting odds from 11/10 on to 2/1 on which is definitely not small beer!

A natural question to ask is whether the 6 knockout matches behave differently to the pool stages.  Alternatively, we might ask whether the errors are correlated with time i.e. were the errors larger in the early games but are now averaging zero?  The best way to answer these questions is to use the principles of Statistical Process Control (SPC) and plot a Control Chart as below.

The black line is a measure of the underlying trend and I can’t see any great change over the tournament.  Errors favouring stronger teams were immediately apparent in the first round of matches and remained largely stable until the 5th round of matches (recall 3 matches were cancelled in the last round due to typhoon Hagibis).  However, the knockout matches do seem to be consistent with the pool stages overall.  So taking everything together it does look like our forecast for the final should be giving more weight to England.

 

My articles on the 2019 Rugby World Cup

  1. Who will win in 2019 – Initial predictions ahead of Pool stage
  2. Who will win in 2019 – Revised predictions ahead of Knockout stage
  3. Who will win in 2019 – Final prediction ahead of the Final
  4. How accurate were my predictions – written before the Final

Filed Under: Forecasting, Sport Tagged With: Forecasts, Rugby, RWC2019, Sport Analytics, World Cup

Search this site

More blogs

Pay Gap Trends #5 – Has pay gap reporting narrowed the gender pay gap?

Disinterested employers were nearly 20% less likely than engaged employers to have narrowed their UK … [Read More...] about Pay Gap Trends #5 – Has pay gap reporting narrowed the gender pay gap?

UK Weather Tracker #63 – April 2022

My mother got me excited today when she said she had read it was the driest April since 1842 but she … [Read More...] about UK Weather Tracker #63 – April 2022

Pay Gap Data #5 – Where can I find gender pay gap data for 2021?

This post was updated on 1st May 2022 with the latest data The government requires all … [Read More...] about Pay Gap Data #5 – Where can I find gender pay gap data for 2021?

UK Weather Tracker #62 – March 2022

It was a warm and sunny March this year in the UK.  In fact it was the 2nd sunniest March on record. … [Read More...] about UK Weather Tracker #62 – March 2022

6 Nations History #2 – The men’s story so far in 2022

The 2022 Six Nations (Men's) ended with France beating England to win their 1st Grand Slam and 6 … [Read More...] about 6 Nations History #2 – The men’s story so far in 2022

Subscribe to Our Newsletter

Join our newsletter mailing list

Receive latest news, articles and offers

We are GDPR compliant. Read our privacy and Cookie policy for more info.

Check your inbox or spam folder to confirm your subscription.

Copyright © 2022 ·Registered in England, Company No. 5577275, VAT No. 883304029. Registered Office – 65 Bristol Road, Keynsham, Bristol, BS31 1EG