Can you differentiate the signal and the noise?

“Prediction is very difficult, especially about the future”: so goes a quote often attributed to Danish physicist Niels Bohr.

Indeed, predictions are very difficult, as can be seen by looking at the dismal track records of experts’ predictions in diverse fields, such as meteorology, sports betting and politics. Even worse, experts tend to be fairly confident about the quality of their predictions despite historical data showing the opposite.

In the book, The Signal and the Noise, author Nate Silver outlines the difficulties in predicting economic development and in locating the few pieces of key information – i.e., the signal – in the vast mass of data available – i.e., the noise.

Will you walk to work today or take the bus? Will you take an umbrella or not?

In our everyday lives, we make decisions based on predictions of what will happen in the future, like whether it will rain or shine.

But predictions are also common in the public realm: stock market analysts, meteorologists and sports commentators all make a living out of them.

One area where one might expect particularly good predictions is the economy. After all, it’s of crucial importance for individuals, companies and even nations, and there’s a wealth of data available: some companies track as many as four million economic indicators.

But despite these factors, economists have an atrocious track record in forecasting.

Consider the commonly predicted economic indicator, the gross domestic product (GDP).

The first problem with GDP predictions is that economists often make predictions like “Next year, GDP will increase by 2.7 percent.” In fact, they’ve derived this figure from a broad prediction interval that says something like “It is 90 percent likely that GDP growth will fall somewhere between 1.3 and 4.2 percent.” So an exact number as a prediction is misleading, as it gives a false sense of precision and security.

What’s worse, economists aren’t actually very good at coming up with prediction intervals either. If their 90 percent prediction intervals were roughly accurate, one would expect the actual GDP to only fall outside the prediction interval one out of ten times. However, a poll of professional forecasters shows that, since 1968, they’ve been wrong roughly half the time. Therefore, it seems that economists’ are not only poor predictors but also gravely overestimate their predictions’ certainty.

Besides GDP predictions, economists are also spectacularly bad at forecasting depressions. Consider that, in the 1990s, economists were only able to predict two out of the sixty depressions that had occurred worldwide one year ahead of time.

To put it kindly, economic predictions should be taken with at least a grain of salt.

Why is predicting the economy so hard?

Quite simply, because a staggering amount of interwoven factors can influence it: a tsunami striking Taiwan can affect whether someone gets a job in Oklahoma.

What’s more, figuring out the causal relationship between different economic factors can be a headache. For example, unemployment rates are generally considered to be affected by the underlying health of the economy because businesses tend to hire more in healthy economic climates. But unemployment rates also influence how much money consumers have to spend, which impacts consumer demand and thus the overall health of the economy.

The above example illustrates another complicating factor: feedback loops. When businesses have increased sales, it spurs them to hire more workers. This then gives those workers more disposable income, which increases consumption and further boosts sales.

There are also external factors that can distort the meaning of many economic indicators. For example, rising house prices are normally a positive indicator, but not if they’re being artificially inflated by government policies.

Ironically, even economic predictions themselves can also affect the economy as people and businesses adjust their behavior depending on them.

And not only the state of the economy is affected by a myriad of factors: the very basics of forecasting are also constantly in flux.

First of all, the global economy is constantly evolving, so even tried-and-true theories go rapidly obsolete. But since it’s impossible to predict when, exactly, this will happen, existing rules of thumb are relied upon until found to be broken.

Second, the data sources that economists work with to understand the past and present are very unreliable, and subject to constant revision. For example, US government data on the last quarter of 2008 indicated a mere 3.8 percent decline in GDP, but the data was later revised and indicated a near 9 percent decline.

No wonder accurate predictions are hard to come by.

The economy is such a complex web of interrelated factors that causality is hard to define. This has lead many economists to try a purely statistical approach: instead of trying to understand what causes have which effects, they just look at huge swathes of data hoping to spot patterns.

Unfortunately, this approach is fraught with potential for errors because it is inevitable that some patterns will emerge due to coincidence alone.

For example, consider that from 1967 to 1997, the winner of the Super Bowl seemed strongly correlated to economic development: in twenty-eight of those thirty years, a winner from the National Football League meant stock market gains for the rest of the year, whereas a winner originally from the American Football League predicted stock market losses. Statistics indicate that the likelihood of this relationship being a coincidence is one in 4,700,000. Clearly, economists should start watching more football, right?

Wrong. In fact this correlation is due to chance alone, and since 1998, the trend has actually been reversed.

With over four million economic indicators being tracked, it is clear that some coincidental correlations like this one will arise. And relying on them to make predictions will eventually backfire, for the coincidence will come to an end some day.

It is therefore crucial that, even if we use technology to wade through huge masses of data, there is still a human there to do the analysis and consider whether there is plausible causality.

But many people don’t realize this. Instead, they try to get more and more information and economic variables to draw predictions from, believing that it will make predictions more accurate when, in fact, all it does is increase the amount of useless information – or noise – which in turn makes it harder to spot useful information – or the signal – hidden within.

Let’s turn our gaze now to four forecasting failures in the run-up to the financial crisis of 2008, starting with those related to the housing bubble.

The first was the overly optimistic belief of homeowners, lenders, brokers and rating agencies that the meteoric rise of US house prices would continue indefinitely. They held this belief despite the fact that, historically, a meteoric rise in housing prices combined with record-low savings had always led to a crash.

So how could everyone have missed that?

One contributing factor was probably that everybody was making too much money in the booming market to begin questioning whether a recession might be around the corner.

The second failure was committed by rating agencies regarding the riskiness of financial instruments called collateralized debt obligations (CDOs), which consisted of a bundle of mortgage debts. The idea was that, as homeowners made payments on their mortgages, investors who held CDOs would earn profits.

Since CDOs were a completely new kind of financial instrument, agencies had to rely solely on statistical models based on the risk of individual mortgages defaulting. Unfortunately, this neglected the possibility of a large-scale housing crash which could bring down prices across the board.

The result was, of course, disastrous. The rating agency Standard & Poor’s had claimed, for example, that the CDOs it gave an AAA-rating only had a 0.12 percent chance of defaulting but, in fact, some 28 percent of them wound up defaulting.

The third predictive failure happened at American financial institutions, which were so eager to chase after profits in the booming market that they leveraged themselves excessively with debt to make more investments.

Take the investment bank Lehman Brothers, which had leveraged itself so that it only had $1 of its own capital for every $33 worth of financial positions it held. In other words, if the value of its portfolio had declined by even 4 percent, it would have faced bankruptcy. Other major US banks also had similarly high levels of leverage. It was as if the industry collectively believed that a recession was impossible.

Of course, this leverage was helping them make huge profits at the time, so no one was too interested in seriously considering the likelihood of a recession.

The fourth failure in prediction was committed by the US government after the recession had struck. As the government’s economic team was crafting the stimulus package in 2009, they thought they were dealing with a regular recession where employment figures would bounce back in one to two years. But history shows that recessions caused by a financial crash usually make unemployment rates stay high for four to six years – and since this recession was caused by such a crash, they should have known better. This made their stimulus package woefully inadequate.

As we’ve seen, forecasting is fraught with difficulties.

One key way of overcoming them when estimating probabilities is to adopt the so-called Bayesian approach, a theorem based on work by eighteenth century English minister Thomas Bayes. This approach provides a mathematical framework for updating one’s beliefs in a rational way as new information comes in.

As an example of how beliefs should be updated, let’s consider this scenario: you’re a woman in your forties worried about breast cancer, so you want to predict how likely you are to have it.

To start with, you see that studies indicate that around 1.4 percent of women develop breast cancer in their forties. This is known as the prior probability: the probability you assume before you get any new information.

Then you decide to get a mammogram, as that procedure can detect breast cancer. To your horror, the result is positive.

What does that mean?

Probably less than you think.

Mammograms are by no means foolproof. On the one hand, if a woman does have breast cancer, a mammogram will only discover it about 75 percent of the time. On the other, even if a woman doesn’t have breast cancer, a mammogram will still indicate that she does about 10 percent of the time.

So knowing this and Bayes’ theorem, how likely is it that you have breast cancer after the positive mammogram?

You may be astonished to hear the likelihood is only 10 percent. What’s more, clinical data confirms this.

This surprise highlights that we don’t have a very good intuitive understanding of how new information, like the mammogram’s result, interacts with old information.

Specifically, we tend to focus too much on what’s new, overemphasizing the mammogram’s result and forgetting that, in fact, the incidence of breast cancer is so low that the false positives far outweigh the correct positives.

Using Bayes’ theorem helps us avoid our inherent biases, like our preference for recent information. So if you discover that you are embarking on this bias, stop and take a breath. Then reconsider.

Check out my related post: How to be far sighted?

Interesting reads:


6 thoughts on “Can you differentiate the signal and the noise?

  1. Looks like a good read. I’m adding this to my to-read list. Anything about the book you would deem a criticism or limitation? Thanks for posting.

    Liked by 2 people

    1. Great question! My only suggestion to the author is to put in more tips/examples of how we could be less swayed by the new in predictions. Example related to the life aspect would be good but maybe it’s me needing more help to predict!

      Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s