How do you identify a good forecaster

“I think the people in this country have had enough of experts”

Michael Gove, Sky News, 3^rd June 2016

This was one of the most memorable quotes during the EU referendum in 2016 and came in response to a question as to why the forecasts of a whole list of organisations such as the IMF should be ignored. It prompted a flurry of rebuttals and articles supporting or damning him and the debate has not gone away.

Like so many quotes, it has already become distorted. I strongly recommend you listen to the full question and answer because here is his quote in its entirety.

“I think the people in this country have had enough of experts… from organisations with acronyms saying that they know what is best and getting it consistently wrong.”

When I read this full quote I realised I am in complete agreement with Michael Gove.

February 2021 – this article has been edited to add some links to new material, especially those relating to the COVID19 pandemic.

First of all, he is not condemning all experts. He is condemning experts from certain organisations who make regular forecasts and keep getting them wrong. It is not just a matter of opinion, it is a matter of fact that the standard of forecasting by so many organisations (especially in economic fields) is lamentable. Recent interviews with key officials at the Bank of England has highlighted this issue.

Tetlock, Taleb & Silver

This issue was brilliantly explored by Philip Tetlock in his book “Superforecasting” and I cannot recommend this book highly enough. If you want an in-depth review then I can recommend one written by Michael Gove’s former Special Advisor no less. Before Tetlock, Nate Silver wrote “The Signal & the Noise” which covered many of the basic principles of good forecasts and I can thoroughly recommend that book as well (I even gave a presentation about this book at the 2013 Royal Statistical Society conference in Newcastle). To round off the list of my recommendations, many of Nassim Taleb’s books are well worth reading with “Anti-Fragile” probably the most relevant to the debate around forecasting.

Experts & Forecasters

Let’s come back to Michael Gove’s full quote and focus on two aspects:

“… experts … saying that they know what is best …”
“… getting it consistently wrong.”

What is an expert in the first place? Can we define what makes someone an expert? I have written a separate post to answer these questions as this is a big subject and one that I have a lot of views on based on my own experiences.

Whilst an expert can be many things, when it comes to forecasting, the credentials of any so called expert are not relevant in my opinion. What matters is their track record in forecasting. A person with no apparent credentials but whose forecasts are twice as accurate as a person laden with credentials should be given more weight. But how do we define forecasting accuracy and can we decide when someone is getting their forecasts “consistently wrong”?

Who is the best election forecaster?

Let’s start with four forecasts made by myself, Matt Singh of NCP politics and the opinion polls. All of us made forecasts for the 2015, 2017 & 2019 General Elections and the 2016 EU referendum. Of the 3 forecasters, who performed best?

In 2015, Matt was the most accurate and predicted the right outcome, a Conservative majority. I was next closest but my predicted outcome was a Conservative minority government.
In 2016, I was closest but I was not explicitly predicting a Leave win. I was basically saying “toss a coin”.
In 2017, the polls were closest but none of us predicted the right outcome, a Conservative minority government.
In 2019, all 3 of us predicted a Conservative majority government but I was closest and even beat Sir John Curtice’s exit poll!

There is a strong case to be made that when forecasters are evaluated, evaluating their predicted outcomes is more important than their predicted numbers. This is probably where I diverge from Tetlock to some degree. Tetlock’s evaluation is based on forecasters’ stated probabilities of certain events happening so someone who said there was a 50% chance of Leave winning (i.e. me) is more accurate than someone who said there was a 25% chance of Leave winning (i.e. Matt Singh) or 29% chance of Trump winning the presidency (Nate Silver in 2016). But an alien who landed on Earth in 2016 without any knowledge of human affairs could have tossed a coin and still had 50% chance of predicting that Leave/Trump would win.

Are forecasters dumb?

My alien analogy brings me onto how I prefer to evaluate forecasters and that is to ask if a forecaster is capable of consistently beating what I call the DUMB forecaster. Let’s use a forecast of tomorrow’s weather as an example. There are two dumb methods available to any weather forecaster:

The weather tomorrow will be the same as today.
The weather tomorrow will be the average over the last X years for that date.

It is not possible to get simpler than these models. In many parts of the world, model 1 is a viable model. My wife is from Texas and constantly tells me that there is no need for a weather forecaster in the summer in Texas as you can guarantee it will be hot & sunny. Now she is living in the UK with me, she constantly checks the weather forecasts and in the UK, we would place more emphasis on model 2 as a viable dumb model since all Brits know that the weather today tells us nothing about tomorrow. A third dumb model would be to take a straight average of these 2 models which is actually one of the most basic time series models available coming from a class of models known as ARMA (my post on whether UK pollsters have forecasting skill uses ARMA as the dumb or baseline forecast).

We therefore have 2 or 3 dumb models (sometimes called baseline models) against which all weather forecasters can be evaluated. Nate Silver demonstrated in his book (concentrating mostly on the US) that weather forecasters are capable of beating the dumb models for up to a week ahead and that there has been consistent improvement in weather forecasting over the last few decades. This is in stark contrast to economists who have shown no improvement over time and indeed struggle to show they are better than dumb models. (see this striking chart looking at long range predictions of interest rates!)

Same Again or Major Change?

However, I think it is important to make a distinction between two types of forecasts. The first can be characterised as “same again”. In other words, your forecast of tomorrow’s weather is that it will be broadly the same as today. So my wife can rest easy in a Texas summer knowing that tomorrow she can dress in t-shirt and shorts just as she did today. Notice, that I am not saying that the forecast is that tomorrow will be exactly the same as today. A “same again” forecast is one that does not lead to a different OUTCOME tomorrow. In this case, the outcome is deciding what to wear and if the forecast does not lead you to change your planned attire for tomorrow then you are reacting to a “same again” forecast. The same can be said for economic forecasts. If GDP grows by 2% this year and next year the forecast is for growth of 1.5% and this does not lead you to change your investment plans (say) then to all intents and purposes, a same-again forecast has been made.

The second type of forecast is one that calls for major change in outcome or behaviour. So a weather forecaster in Texas saying that tomorrow there will be torrential rain and flash floods will clearly cause a change of behaviour from my wife. Similarly, an economist that forecasts that a vote for Leave will cause a recession is clearly stating that Brexit will lead to major economic impact.

In practice of course, the two types of forecast I describe are really the two ends of a full spectrum of forecasts but the simplification I make is a valid one in my opinion. After all, Matt Singh’s forecast for the 2015 general election was for “major change” from coalition to Conservative majority government whereas I forecast “same again”. For the EU referendum, Matt’s forecast was for “same again” since Remain equated to status quo whereas I was on the boundary between “same again” and “major change” and unable to decide between the two.

Let’s look again at our two dumb forecasting models for weather. The first is that tomorrow will be the same as today is a clear “same again” model and by definition will be a good predictor of “same again” events and rubbish at “major change” events. The second that tomorrow will be the X-year average will good at “same again” and poor at “major change” if today is close to the X-year average. On the other hand, if today is very different from the X-year average, then it will be poor at “same again” events and good at “major change” events. The latter point arises from a well-known phenomenon known as “reversion to the mean” and in my experience is often overlooked by analysts in many industries who seem to prefer “same again” forecasting models.

False Positive & False Negative Rates

So when you are evaluating a forecaster, I think there is nothing wrong in producing the following table to summarise their forecasts.

I’ve borrowed two terms from medicine to distinguish between the two types of errors a forecaster can make. Suppose you are having a test for cancer and the test comes back negative, that can be construed as a “same again” forecast that you do not have cancer. A positive test result is a “major change” forecast that you do have cancer. An incorrect positive test result is called a “false positive” and an incorrect negative test result is a “false negative”. So a forecaster can be evaluated in two ways; their false positive rate which is the % of their “major change” forecasts that are wrong and their false negative rate which is the % of their “same again” forecasts that are wrong.

Let’s imagine you are comparing two forecasters and trying to decide whether to believe forecaster A or forecaster B. Forecaster A has a 5% false positive rate and a 40% false negative rate and Forecaster B has a 20% false positive rate and 20% false negative rate. A is less likely to “cry wolf” like B does but B is more likely to spot the wolf than A. Who is better? The boy who cried wolf story is a very apt one for forecasters. Those who constantly predict doom and gloom will be ignored as the doom fails to materialise but only needs to be correct once to be able to say “I told you so!” On the other hand, the poor track record of doom & gloom merchants makes it hard to trust them and the constant false positives can still have physical and emotional consequences for those that believe and react unnecessarily to the forecasts. The COVID19 pandemic is in many ways the ultimate test of the boy crying wolf story since we need to predict not only the direct impact of COVID19 deaths and infections but also the direct and indirect impacts of actions and non-actions taken in response.

Turkeys, Wolves & Coins

Taleb prefers to use Turkeys instead of Wolves. A flock of turkeys will come to believe that the farmer who feeds and looks after them so well is a wonderful person until Thanksgiving & Christmas arrives and their illusions are shattered. The point about the Turkey story is that the turkeys have no data at all that will allow them to assign a probability to their existence coming to an abrupt end and therefore are incapable of being able to forecast their imminent demise. This is one of the reasons why Taleb believes that forecasts are not important in the first place. What matters is your resilience and/or ability to thrive in spite of forecasting errors. This theme is what his book “Anti-Fragile” covers in great depth and provides a very thought-provoking philosophy to consider.

In fact, Taleb’s book is much more about risk management than forecasting. Rather than trying to predict events, one instead manages one’s organisation in order to resist or benefit from risks. Risk management is a dirty word in Taleb’s world given his own experiences but I am using it as the counter point of forecasting. For me, forecasting and risk management are two sides of the same coin and depending on what clients are trying to achieve, the first part of any consultation I do is to decide which side of the coin is more important to the client. From that, I can decide what the best modelling approach is.

By now, I hope I have convinced you that any organisation making a forecast should be evaluated on their forecasting track record rather than their status or credentials. In an ideal world, organisations making forecasts should be publishing their track record so that their false positive and false negative rates can be evaluated. In particular, the track record should include evidence of the forecast being made ahead of time so that we know that we have a true forecast rather than a hindcast. In my opinion, It was thoughts along these lines that prompted Michael Gove’s now famous comment and they are the reasons why I endorse his comments.

Forecasts come in 3 flavours

I have been assuming that you understand what a forecast is but I want to finish off by defining this in more detail and to illustrate ways forecasters can get it wrong. I have already talked about the distinction between numerical forecasts (where we express our predictions as a single number or numerical range) and outcome forecasts (where we express our predictions as a probability of a specific outcome or outcomes occurring). I have also introduced the concept of a dumb forecast as a benchmark for evaluating forecasts. Together, these highlight a useful distinction in forecasting, namely technical & fundamental forecasting.

A TECHNICAL forecast is one that is based on observed trends and patterns in the quantity I am trying to forecast. For example, I could look at the chart of UK GDP growth and try to discern a pattern that can be extrapolated into the future. So I might observe that the moving average for GDP growth is currently +0.5% and the moving standard deviation is +0.2% and use those figures to predict that GDP growth in 2017 Q1 will be +0.5% (numerical forecast) or that there is a 2 in 3 chance of it being within +/-0.2% of this figure (numerical range forecast). Alternatively I might predict that there is only a 2% chance of negative or no growth which would be an outcome forecast.

More complicated pattern extraction methods could be used instead and might result in a different technical forecast. But in the end, whether complex or simple, technical forecasts are still dumb forecasts as they assume that the trend or pattern identified will be repeated in the future. If you are reading a forecast in the news or a report, the clue that it is a technical forecast will be that the only data presented is the quantity being forecast and the pattern or trend will be described.

A FUNDAMENTAL forecast is one where a model is used to correlate the quantity we are trying to forecast e.g. GDP, with some other variable e.g. number of job adverts in the UK. The forecast will then be based on known values of that other variable and with that knowledge we apply our model to derive an estimate of what the quantity will be in the future. So a 10% increase in job adverts might equate to 1% growth in GDP and if we see such growth in the number of adverts, we can make a forecast of GDP. Again if you are reading a report or news article, the clue that you are reading about a fundamental forecast will be if you see references to two or more variables and a description of the correlation between them.

Our 3rd flavour is a forecast based on SCENARIOS. In this type of forecast, a range of scenarios for a variable is put forward e.g. CO2 emissions over the next 20 years, the model is applied to every scenario generating a range of future climates and then from the data generated a forecast is made (point forecast, range or outcome probability). Such forecasts should include probabilities of each scenario coming to pass e.g. scenario A might be deemed likely, scenario B unlikely, etc so that one ends up with a weighted average of scenarios.

Detecting bad forecasters

So if you are reading about a recent forecast and you don’t have any information as to the track record of the person making the forecast, how can you sort the wheat from the chaff? The full list of clues is endless and I intend to use my blog to highlight these in future but here are some that I have seen over the years.

Cherry picking your history – In the GDP chart, I think you can see that GDP growth used to be much more volatile in the past until around 1990. Since then, growth has been more consistent. Too often, forecasters will cut off their history at some date and only use the data since that date. Taleb is especially critical of this thought process and believes the financial crisis of 2008 was partly brought on by this kind of thinking. That doesn’t mean that all history has to be used equally, you can give less weight to some periods of time than others but it is wrong to give zero weight to some data if the data is available.
Failure of imagination with scenarios – A similar error can occur with scenario based forecasts. If the forecast is based on a limited range of scenarios or even rules out certain scenarios, it is guilty of cherry picking as with point 1. Again it is reasonable to give more weight to certain scenarios than others provided the reasoning is explained but no scenario should be ruled out as impossible. Lack of imagination is a common problem with scenario forecasts as I found to my cost with my forecast of the Stoke Central by-election!
Fundamental forecasts based on a scenario – a problem with fundamental models is that in order to make a forecast of the future, you need to forecast the input variable as well. So in my GDP example earlier, to make a forecast of GDP in a year’s time, I have to predict how many jobs will be advertised next year. This just adds uncertainty to the forecast so what some people do instead is imagine a specific scenario in a year’s time with job adverts and then use the model to make the GDP forecast. Except that this is not a forecast! It is a single scenario only. This is a surprisingly common error and sometimes hard to catch. Sir Patrick Vallance, the UK Chief Scientific Officer, found this out in September 2020 when presenting a scenario that the government wanted to avoid (outcome was they avoided it within their stated timescale but ended in that scenario 6 weeks later).
Post-hoc explanation of history – when making technical forecasts, all one should be doing is extrapolating the pattern or trend. Being human we often want to explain why we see the history that we do. As noted in point 1, UK GDP growth was steadier after 1990 so it is natural to ask why. Three things that come to mind are the collapse of communism, the start of the single market in the EU and Britain being kicked out of the ERM. Unless you can build proper statistical models of these effects, such apparent explanations can be no more than co-incidence of timing and have to be treated as hypotheses. Unfortunately, articles do get written which use such coincidences as a way of explaining why something will or will not happen in the future. A recent example was explored in this BBC article on whether baby boxes will help reduce infant mortality.

So there you have it, my guide on how to identify a good forecaster. Please do look out for future posts where I illustrate the good, bad and ugly of forecasting.

— Subscribe to my newsletter to receive more articles like this one! —-

If you would like to receive notifications from me of news, articles and offers relating to weather, please click here to go to my Newsletter Subscription page and tick the Forecasting category and other categories that may be of interest to you. You will be able to unsubscribe at anytime.

— Want to read more articles about Forecasting? —

Please click on the Forecasting tab at the top of the screen to a see a list of my forecasting posts in reverse chronological. Alternatively, click on this link to see a list of my most relevant posts sorted by theme. The latter link forms part of my training course “Identifying Trends & Making Forecasts“.