TSstudio 0.1.3
I used the Thanksgiving break to push a new update of the TSstudio package to CRAN (version 0.1.3). The new version includes an update for the ts_backtesting
function along with two new function - ts_to_prophet
for converting time series objects to a prophet input format (i.e., ds
and y
columns), and ccf_plot
for lags plot between two time series. The package can be installed from either CRAN or Github:
# CRAN
install.packages("TSstudio")
# Github
# install.packages("devtools")
devtools::install_github("RamiKrispin/TSstudio")
library(TSstudio)
packageVersion("TSstudio")
## [1] '0.1.5'
Converting time series object to a prophet format
The ts_to_prophet
function converting ts
, xts
and zoo
objects into prophet input format (i.e., data frame with two columns - ds for date and y for the series values). For instance, convertig the USgas
series to a prophet object:
data("USgas")
ts_info(USgas)
## The USgas series is a ts object with 1 variable and 235 observations
## Frequency: 12
## Start time: 2000 1
## End time: 2019 7
USgas_prophet <- ts_to_prophet(USgas)
head(USgas)
## [1] 2510.5 2330.7 2050.6 1783.3 1632.9 1513.1
head(USgas_prophet)
## ds y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
In the case of a ts
object, where the index is not a date object, the function extracts the time component from the first observation and use it along with the frequency of the series to estimate the date column of the prophet data frame. For instance, in the case of a monthly series, where the time object provides only the year and the month, by default the day component of the date object will be set to 1. Alternatively, if known, you can set the date of the first observation with the start
argument. For example, if the USgas
series is being captured during the mid of the month (or every 15th of the month):
USgas_prophet <- ts_to_prophet(USgas, start = as.Date("2000-01-15"))
head(USgas_prophet)
## ds y
## 1 2000-01-15 2510.5
## 2 2000-02-15 2330.7
## 3 2000-03-15 2050.6
## 4 2000-04-15 1783.3
## 5 2000-05-15 1632.9
## 6 2000-06-15 1513.1
Similarly, the function can handle xts
and zoo
objects:
data("EURO_Brent")
ts_info(EURO_Brent)
## The EURO_Brent series is a zoo object with 1 variable and 389 observations
## Frequency: monthly
## Start time: May 1987
## End time: Sep 2019
head(EURO_Brent)
## May 1987 Jun 1987 Jul 1987 Aug 1987 Sep 1987 Oct 1987
## 18.58 18.86 19.86 18.98 18.31 18.76
ts_to_prophet(EURO_Brent) %>% head()
## ds y
## 1 1987-05-01 18.58
## 2 1987-06-01 18.86
## 3 1987-07-01 19.86
## 4 1987-08-01 18.98
## 5 1987-09-01 18.31
## 6 1987-10-01 18.76
Lags plots of two series
The second function, ccf_plot
, provides an interactive and intuitive visualization of the cross-correlation between two time series, by plotting a series against another series (and its lags) and calculating the correlation between the two with the ccf
function. For instance, let’s use the function to plot the relationship between the unemployment rate and the total vehicle sales in the US:
data("USUnRate")
ts_info(USUnRate)
## The USUnRate series is a ts object with 1 variable and 861 observations
## Frequency: 12
## Start time: 1948 1
## End time: 2019 9
data("USVSales")
ts_info(USVSales)
## The USVSales series is a ts object with 1 variable and 525 observations
## Frequency: 12
## Start time: 1976 1
## End time: 2019 9
ccf_plot(x = USVSales, y = USUnRate)
The function automatically aligned and used only the overlapping observations of the two series before calculating the cross-correlation values between the series and the lags of the second series (where the 0 lag represents the series itself, and negative lags represent the leading lags). The title of each plot specifies the lag number and the cross-correlation value. The lags
argument of the function defines the number of lags in the plot, where the use of negative lags defines the leading indicators. For example, setting the lags
argument to -6:6 will plot the first 6 lags, the series itself and the first 6 leading lags of the series:
ccf_plot(x = USVSales, y = USUnRate, lags = -6:6)
Forecasting with backtesting and xreg
The ts_backtesting
function for training and testing multiple models (e.g., auto.arima
, HoltWinters
, nnetar
, etc.) with backtesting approach, is now supporting the xreg
component of the auto.arima
, nnetar
( forecast package)and their embedment in the hybridModel
model ( forecastHybrid package). The use of the xreg
component is straightforward and required two components:
- The predictors - or the regressors component in a vector or matric format will be used as an input to the model
xreg
argument. The length of this input must be aligned with the length of the input series - The future values of the predictors - a vector or matrix must correspond to the inputs which used as predictors, where the length of this component must be aligned to the forecast horizon (or the
h
argument of the function). This setting of this component is done with thexreg.h
argument
For instance, let’s forecast the monthly consumption of natural gas in US in the next 5 years (or 60 months) by regressing the USgas
series with its Fourier terms, using auto.arima
, nnetar
and hybridModel
models. We will use the fourier
function from the forecast package to generate both the inputs for the regression model (x_reg
) and future values for the forecast itself (x_reg.forecast
):
# Setting the forecast horizon
h <- 60
library(forecast)
# Creating the xreg component for the regression
x_reg <- fourier(USgas, K = 5)
# Creating the xreg component for the forecast
x_reg.forecast <- forecast::fourier(USgas, K = 5, h = h)
Note that the ts_backtesting
function automatically split and aligned the xreg
component according to the expanding window movement of the function. We will set the function to run backtesting using 6 periods/splits to train auto.arima
, nnetar
and hybridModel
models, in order to examine the performance of the models over time:
md <- ts_backtesting(ts.obj = USgas,
error = "MAPE",
models = "anh",
periods = 6,
h = h,
xreg.h = x_reg.forecast,
a.arg = list(xreg = x_reg),
h.arg = list(models = "aetsfz",
a.args = list(xreg = x_reg),
verbose = FALSE),
n.arg = list(xreg = x_reg),
plot = FALSE)
## Model_Name avgMAPE sdMAPE avgRMSE sdRMSE
## 1 hybrid 3.2883333 0.95547719 100.35500 33.593355
## 2 auto.arima 4.2500000 1.81685442 123.41333 34.677851
## 3 nnetar 4.5200000 1.65343279 133.14333 50.975960
We can now review the performance of each model using the summary plot:
md$summary_plot
The summary plot provides the error distribution of each model and the plot forecasting model which performed best on the backtesting. The output contains the models’ performance on the backtesting (i.e., summary plot and leaderboard). In this case, since we set the error
argument to MAPE
, the function selected the auto.arima
final forecast. Yet, you can see in the plot that the error rate of the hybrid
model is more stable compared to the auto.arima
and it might be a better choice (the hybrid
contains both the auto.arima
and other models, which potentially helps to hedge the error). All the models’ information available on the Forecast_Final
folder. For example, you can pull the auto.arima
model and check its residuals:
check_res(md$Forecast_Final$auto.arima)
The plan for future releases is to expend the functionality of the ts_backtesting
function, by adding additional models (e.g., tslm
, prophet
, etc.) and expend the window setting of the backtesting (adding sliding window option).