I used the Thanksgiving break to push a new update of the TSstudio package to CRAN (version 0.1.3). The new version includes an update for the ts_backtesting function along with two new function - ts_to_prophet for converting time series objects to a prophet input format (i.e., ds and y columns), and ccf_plot for lags plot between two time series. The package can be installed from either CRAN or Github:

# CRAN
install.packages("TSstudio")

# Github
# install.packages("devtools")
devtools::install_github("RamiKrispin/TSstudio")
library(TSstudio)
packageVersion("TSstudio")
## [1] '0.1.5'

Converting time series object to a prophet format

The ts_to_prophet function converting ts, xts and zoo objects into prophet input format (i.e., data frame with two columns - ds for date and y for the series values). For instance, convertig the USgas series to a prophet object:

data("USgas")

ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 235 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2019 7
USgas_prophet <- ts_to_prophet(USgas)

head(USgas)
## [1] 2510.5 2330.7 2050.6 1783.3 1632.9 1513.1
head(USgas_prophet)
##           ds      y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1

In the case of a ts object, where the index is not a date object, the function extracts the time component from the first observation and use it along with the frequency of the series to estimate the date column of the prophet data frame. For instance, in the case of a monthly series, where the time object provides only the year and the month, by default the day component of the date object will be set to 1. Alternatively, if known, you can set the date of the first observation with the start argument. For example, if the USgas series is being captured during the mid of the month (or every 15th of the month):

USgas_prophet <- ts_to_prophet(USgas, start = as.Date("2000-01-15"))

head(USgas_prophet)
##           ds      y
## 1 2000-01-15 2510.5
## 2 2000-02-15 2330.7
## 3 2000-03-15 2050.6
## 4 2000-04-15 1783.3
## 5 2000-05-15 1632.9
## 6 2000-06-15 1513.1

Similarly, the function can handle xts and zoo objects:

data("EURO_Brent")
ts_info(EURO_Brent)
##  The EURO_Brent series is a zoo object with 1 variable and 389 observations
##  Frequency: monthly 
##  Start time: May 1987 
##  End time: Sep 2019
head(EURO_Brent)
## May 1987 Jun 1987 Jul 1987 Aug 1987 Sep 1987 Oct 1987 
##    18.58    18.86    19.86    18.98    18.31    18.76
ts_to_prophet(EURO_Brent) %>% head()
##           ds     y
## 1 1987-05-01 18.58
## 2 1987-06-01 18.86
## 3 1987-07-01 19.86
## 4 1987-08-01 18.98
## 5 1987-09-01 18.31
## 6 1987-10-01 18.76

Lags plots of two series

The second function, ccf_plot, provides an interactive and intuitive visualization of the cross-correlation between two time series, by plotting a series against another series (and its lags) and calculating the correlation between the two with the ccf function. For instance, let’s use the function to plot the relationship between the unemployment rate and the total vehicle sales in the US:

data("USUnRate")

ts_info(USUnRate)
##  The USUnRate series is a ts object with 1 variable and 861 observations
##  Frequency: 12 
##  Start time: 1948 1 
##  End time: 2019 9
data("USVSales")

ts_info(USVSales)
##  The USVSales series is a ts object with 1 variable and 525 observations
##  Frequency: 12 
##  Start time: 1976 1 
##  End time: 2019 9
ccf_plot(x = USVSales, y = USUnRate)

The function automatically aligned and used only the overlapping observations of the two series before calculating the cross-correlation values between the series and the lags of the second series (where the 0 lag represents the series itself, and negative lags represent the leading lags). The title of each plot specifies the lag number and the cross-correlation value. The lags argument of the function defines the number of lags in the plot, where the use of negative lags defines the leading indicators. For example, setting the lags argument to -6:6 will plot the first 6 lags, the series itself and the first 6 leading lags of the series:

ccf_plot(x = USVSales, y = USUnRate, lags = -6:6)

Forecasting with backtesting and xreg

The ts_backtesting function for training and testing multiple models (e.g., auto.arima, HoltWinters, nnetar, etc.) with backtesting approach, is now supporting the xreg component of the auto.arima, nnetar ( forecast package)and their embedment in the hybridModel model ( forecastHybrid package). The use of the xreg component is straightforward and required two components:

  • The predictors - or the regressors component in a vector or matric format will be used as an input to the model xreg argument. The length of this input must be aligned with the length of the input series
  • The future values of the predictors - a vector or matrix must correspond to the inputs which used as predictors, where the length of this component must be aligned to the forecast horizon (or the h argument of the function). This setting of this component is done with the xreg.h argument

For instance, let’s forecast the monthly consumption of natural gas in US in the next 5 years (or 60 months) by regressing the USgas series with its Fourier terms, using auto.arima, nnetar and hybridModel models. We will use the fourier function from the forecast package to generate both the inputs for the regression model (x_reg) and future values for the forecast itself (x_reg.forecast):

# Setting the forecast horizon
h <- 60

library(forecast)
# Creating the xreg component for the regression
x_reg <- fourier(USgas, K = 5)

# Creating the xreg component for the forecast 
x_reg.forecast <- forecast::fourier(USgas, K = 5, h = h)

Note that the ts_backtesting function automatically split and aligned the xreg component according to the expanding window movement of the function. We will set the function to run backtesting using 6 periods/splits to train auto.arima, nnetar and hybridModel models, in order to examine the performance of the models over time:

md <- ts_backtesting(ts.obj = USgas,
                     error = "MAPE",
                     models = "anh",
                     periods = 6,
                     h = h,
                     xreg.h = x_reg.forecast,
                     a.arg = list(xreg = x_reg),
                     h.arg = list(models = "aetsfz", 
                                  a.args = list(xreg = x_reg), 
                                  verbose = FALSE),
                     n.arg = list(xreg = x_reg),
                     plot = FALSE)
##   Model_Name   avgMAPE     sdMAPE   avgRMSE    sdRMSE
## 1     hybrid 3.2883333 0.95547719 100.35500 33.593355
## 2 auto.arima 4.2500000 1.81685442 123.41333 34.677851
## 3     nnetar 4.5200000 1.65343279 133.14333 50.975960

We can now review the performance of each model using the summary plot:

md$summary_plot

The summary plot provides the error distribution of each model and the plot forecasting model which performed best on the backtesting. The output contains the models’ performance on the backtesting (i.e., summary plot and leaderboard). In this case, since we set the error argument to MAPE, the function selected the auto.arima final forecast. Yet, you can see in the plot that the error rate of the hybrid model is more stable compared to the auto.arima and it might be a better choice (the hybrid contains both the auto.arima and other models, which potentially helps to hedge the error). All the models’ information available on the Forecast_Final folder. For example, you can pull the auto.arima model and check its residuals:

check_res(md$Forecast_Final$auto.arima)

The plan for future releases is to expend the functionality of the ts_backtesting function, by adding additional models (e.g., tslm, prophet, etc.) and expend the window setting of the backtesting (adding sliding window option).