Correlation analysis (along with seasonal analysis) is one of the main elements of the descriptive analysis of time series data. Furthermore, those patterns and insights revealed in the descriptive analysis play a pivotal role in the building process of a forecasting model. A common example is the setting of the AR and MA processes of the ARIMA models, which can be done with the autocorrelation and partial autocorrelation functions. Typically, time-series data will have some degree of correlation with its past lags. This can be seen in the hourly temperatures, as the current temperature is most likely to be close to the one during the previous hour. Moreover, correlation is also could reveal seasonal patterns as it is most likely that the series will have a high correlation with its seasonal lags. The TSstudio provides several tools for correlation analysis, such as ts_cor, ts_lags, and ccf_plot.

The ACF and PACF functions

Traditionally, the acf (autocorrelation) and pacf (partial autocorrelations) functions from the stats package are used to calculate and plot the correlation relationship between the series and its lags. The ts_cor function provides an interactive wrapper and flexible version for those functions. Let’s load the UKgrid series from the UKgrid package. This series represents the demand for electricity in the UK. Will use the extract_grid function to aggregate the series to daily frequency:

Before plotting the ACF and PACF of the series, let’s review the series main characteristics:

It is easy to observe, just by looking at the plot above, that the series has a strong seasonal component across the day of the year. If you zoom in, you can also observe that the series has a strong seasonal component across the day of the week (e.g., high consumption during the normal working days of the weeks and relatively low consumption throughout the weekend days). Quantify those relationships can be done by measuring the level of correlation between the series and its different seasonal lags. One of the downsides of the acf and pacf functions that it is hard to isolate a specific seasonal lag and compare that relationship over time. For example, the below plots represent the ACF and PACF relationship of the demand for electricity with its past 730 lags (or two seasonal cycles):

par(mfrow=c(2,1))

acf(UKgrid_daily, 
    lag.max = 365 * 2)

pacf(UKgrid_daily, 
     lag.max = 365 * 2)

You can observe from the ACF plot above that the series is highly correlated with both its default seasonal lags (i.e., frequency 365) and also with the close lags. Yet, it is hard to compare between the two due to the high density of the plot. The ts_cor solve this issue by providing a more friendly plot of the ACF and PACF functions:

Note that the X-axis legend of the ts_cor plot represents the actual lags number (i.e., 1 for the first lag, 2 for the second, etc.), as opposed to the acf plot that represents the seasonal lags (i.e., 1/frequency for the first lag, or 1 for the first seasonal lag). In addition, the interactivity feature of the plot enables to view in more detail a specific range of lags by zoom in it. For example, in the plot above, if zoom in on the seasonal lag, you can notice that lag 364 is actually is more correlated with the series than the seasonal lag 365. This indicates that the series has a high correlation with the day of the week (or lags 7, 14, 21, etc.), as the order of the lag 364 series aligns with the corresponding day of the week of the series itself (as the reminder of 364 by 7 is 0).

The seasonal argument, when set to TRUE, allows you to distinguish the seasonal lags from the non-seasonal lags. In addition, you can remove the non-seasonal lags by clicking on the non-seasonal legend. This is very useful to tune the seasonal components of SARIMA model. The seasonal_lags argument allows you to mark and distinguish other seasonal lags, besides the default one. For instance, add to the plot the weekly lags by setting the seasonal_lags to 7:

Lags plot

Lags plot is an alternative approach for visualizing the relationship between the series and its lags. As linear the relationship between the series and its lags, the higher the correlation between the two. For example, let’s plot the relationship of the demand for electricity with its first 7 lags:

ts_lags(UKgrid_daily, lags = c(1:7))