Cointegration vs Spurious Correlation: Understand the Difference for Accurate Analysis | by Egor Howell | Jul, 2023

Why does not equal causation for series

Egor Howell
Towards Data Science
by Wance Paleri on Unsplash

In time series analysis, it is valuable to understand if one series influences another. For example, it is for commodity to know if an increase in commodity A leads to an increase in commodity B. Originally, this was measured using linear regression, however, in the 1980s Clive Granger and Paul Newbold showed this yields incorrect results, particularly for non-stationary time series. As a result, they conceived the concept of cointegration, which won Granger a Nobel prize. In this post, I want to discuss the need and application of cointegration and why it is an important concept Scientists should understand.

Overview

Before we discuss cointegration, let’s discuss the need for it. Historically, statisticians and economists used linear regression to determine the relationship between different time series. However, Granger and Newbold showed that this approach is incorrect and leads to something called spurious correlation.

A spurious correlation is where two time series may look correlated but truly they lack a causal relationship. It is the classic ‘correlation does not mean causation‘ statement. It is dangerous as even statistical may well say that there is a casual relationship.

Example

An example of a spurious relationship is shown in the plots below:

Plot generated by author in .

Here we have two time series A(t) and B(t) plotted as a function of time (left) and plotted against each other (right). Notice from the plot on the right, that there is some correlation between the series as shown by the regression line. However, by looking at the left plot, we see this correlation is spurious because B(t) consistently increases while A(t) fluctuates erratically. Furthermore, the average distance between the two time series is also increasing…

Source link