Correlation analysis of time series goes side by side with seasonal analysis. The goal of correlation analysis is to provide insights on the relationship between the series and its lags. This information can then be used to define and tune the parameters of forecasting models such as ARIMA, TSLM, etc.
Correlation analysis is mainly based on data visualization and statistical tools.
library(TSstudio)
library(plotly)
library(feasts)
library(tsibble)
In the following examples, we will use the AirPassengers
series again:
data("AirPassengers")
ts_plot(AirPassengers,
slider = TRUE)
The Auto Correlation Function, or ACF, is the main tool for quantifying the level of correlation between a series and its lags. This method is fairly similar (both mathematically and logically) to the Pearson correlation coefficient but has time awareness:
\[r_{k} = \frac{\sum_{t = k+1}^{n-k}(x_{t-k} - \overline{x})(x_t-\overline{x})}{\sum_{t = 1}^{n}(x_t - \overline{x})^2}, ~ where\]
In R, the acf
function from the stats package is the main tool for calculating the series AC (Auto Correlation). It supports only ts
objects. We later see the ACF
function from the feasts package, a wrapper of the acf
function that supports tsibble
objects. Let’s use the acf
function to calculate the AC of the AirPassengers
series:
acf(AirPassengers)
The Partial Auto Correlation Function (PACF) is a conditional correlation between a series and lag k, given the impact of the lags 1 to k-1 on the series. Likewise the pcf
function, the pacf
function is the corresponding function for calculating the PAC (Partial Auto Correlation) of a series with its lags:
pacf(AirPassengers)
The key applications of the ACF and PACF functions are:
The ts_cor
function from provides an interactibe wrapper for the acf
and pacf
functions, plotting both correlations:
ts_cor(AirPassengers, lag.max = 72)
By default, the function marked the seasonal lags in red. You can use the seasonal_lags
argument to mark additional seasonal lags (in case exists):
ts_cor(AirPassengers, lag.max = 72, seasonal_lags = 3)
The lag plot is a common method for visualizing the correlation between a series and its lags. The ts_lags
function from the TSstudio package provides this functionality:
ts_lags(AirPassengers)
As the relationship between a series and its past lag looks more linear, the higher the correlation between the series and the lag. In the case of the AirPassengers
series, you can see that the series has a strong linear relationship with the first and seasonal (lag 12) lags, as observed before with the acf
function.
The lags
arguments enable us to plot the series against specific lags. For example, we could plot the relationship of the series with its past three seasonal lags (e.g., 12, 24, and 36):
ts_lags(AirPassengers, lags = c(12,24, 36))
So far, the tools we saw above supports ts
objects. As we saw before, the ** feasts ** provides wrappers for the stats main functions for time series analysis for tsibble
objects. First, let’s convert the AirPassengers
to a tsibble
object:
ap_tsibble <- AirPassengers %>% as_tsibble()
The ACF
and PACF
functions, as their names impay, provides a warappers for the acf
and pacf
functions. By default, unlike the origin functions it won’t plot the output and we will have to add the autoplot
function to plot the results:
ap_tsibble %>% ACF(value, lag_max = 48) %>% autoplot()
ap_tsibble %>% PACF(value, lag_max = 48) %>% autoplot()
The gg_lag
provides lag plots. The nice feature of this function that it colored the observations by their frequency units. By default, it used geom
line, which I found a bit confusing, so we will set the geom
to point
:
p <- ap_tsibble %>% gg_lag(value, geom="point")
p
We can make this plot interactive by using the ggplotly
function from the plotly package:
ggplotly(p)