The TSstudio package provides a set of functions for train, test, and evaluate time series forecasting models, this vignette covers the main ones.
The ts_split
function split time series data into training (sample-in) and testing (sample-out) partitions, keeping the chronological order of the series. The sample.out
argument defines the length of the testing partition. In the following example, we will use the function to split the USgas
series (the monthly demand for natural gas in the US) into training and testing partitions, leaving the last 12 months as testing partition:
library(TSstudio)
data(USgas)
ts_info(USgas)
#> The USgas series is a ts object with 1 variable and 238 observations
#> Frequency: 12
#> Start time: 2000 1
#> End time: 2019 10
USgas_par <- ts_split(USgas, sample.out = 12)
train <- USgas_par$train
test <- USgas_par$test
ts_info(train)
#> The train series is a ts object with 1 variable and 226 observations
#> Frequency: 12
#> Start time: 2000 1
#> End time: 2018 10
ts_info(test)
#> The test series is a ts object with 1 variable and 12 observations
#> Frequency: 12
#> Start time: 2018 11
#> End time: 2019 10
The test_forecast
function visualizes the performance of a forecasting model on the training and testing partitions by plotting the fitted and forecasted values against the actuals. The function supports forecast models from the forecast
and stats
packages such as auto.arima
, tslm
, ets
, HoltWinters
, etc.
For example, let’s train a tslm
model on the train
partition and forecast the corresponding observations of the test
partition:
Using the actual series (USgas
), forecast object (fc
), and the corresponding testing partition (test
), the function will plot the fitted and forecasted values against the actuals:
test_forecast(actual = USgas,
forecast.obj = fc,
test = test)
The MAPE and RMSE score of both the fitted and forecasted values can be seen when hovering on the plot.
The train_model
function replaces the ts_backtesting
function, which will deprecate on the next version. The function provides a flexible framework for forecasting. The use of the function is for train, test, and evaluate multiple models (or different versions of those models) with the use of a backtesting approach. The backtesting approach enables you to evaluate the performance of the model over time (as opposed to single training and testing partitions) and select a model that performed best using some error criteria (e.g., MAPE or RMSE).
The main arguments of this train_model
function are:
methods
- defines the type of models and their setting to be used on the training process. Currently, it supports the following models:
arima
- model from the stats packageauto.arima
- model from the forecast packageets
- model from the forecast packageHoltWinters
- model from the stats packagennetar
- model from the forecast packagetslm
- model from the forecast package (note that the ‘tslm’ model must have the formula argument in the ‘method_arg’ argument)
The method
argument defines as a list
and using the following structure:
list(model_1_id = list(method = "[type of model]",
method_arg = list("[model 1 arguments]"),
notes = "[set notes for model 1]"),
.
.
.
model_n_id = list(method = "[type of model]",
method_arg = list("[model n arguments]"),
notes = "[set notes for model n]"))
Where each element of the list defines the specific model and its setting. The name of the element represents the model ID, and it has three arguments: method
- the type of model to use (one of the listed models above) method_arg
- a list (optional), enables you to modify the default setting of the model argument notes
- a character (optional), allows you to set notes about the model. This useful when testing multiple versions of the same models
train_method
- a list, set the window structure of the backtesting process, where:partitions
- set the number of partitions to use on the backtestingsample.out
- set the length of the testing partitionsspace
- defines the expansion length of the backtesting windowNote: by default, the backtesting algorithm is using the expending window approach. Alternatively, it is possible to use a sliding window approach by setting the sample.in
argument.
The following example demonstrated the use of the train_model
function to forecast the demand for natural gas in the next 12 months. Let’s first defines the forecasting methods:
methods <- list(ets1 = list(method = "ets",
method_arg = list(opt.crit = "lik"),
notes = "ETS model with opt.crit = lik"),
ets2 = list(method = "ets",
method_arg = list(opt.crit = "amse"),
notes = "ETS model with opt.crit = amse"),
arima1 = list(method = "arima",
method_arg = list(order = c(2,1,0)),
notes = "ARIMA(2,1,0)"),
arima2 = list(method = "arima",
method_arg = list(order = c(2,1,2),
seasonal = list(order = c(1,1,1))),
notes = "SARIMA(2,1,2)(1,1,1)"),
hw = list(method = "HoltWinters",
method_arg = NULL,
notes = "HoltWinters Model"),
tslm = list(method = "tslm",
method_arg = list(formula = input ~ trend + season),
notes = "tslm model with trend and seasonal components"))
Similarly, we will define the window structure of the backtesting by using:
train_method = list(partitions = 6,
sample.out = 12,
space = 3)
We will now set the train_model
function, using the methods
and train_method
lists to set the corresponding arguments of the function. The horizon
argument defines the length of the forecast, and the error
argument defines the error metric to use to evaluate the performance of the models:
md <- train_model(input = USgas,
methods = methods,
train_method = train_method,
horizon = 12,
error = "MAPE")
#> # A tibble: 6 x 7
#> model_id model notes avg_mape avg_rmse `avg_coverage_8… `avg_coverage_9…
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 hw HoltW… HoltWinte… 0.0482 144. 0.792 0.931
#> 2 ets1 ets ETS model… 0.0526 156. 0.833 0.972
#> 3 arima2 arima SARIMA(2,… 0.0546 163. 0.583 0.819
#> 4 ets2 ets ETS model… 0.0650 185. 0.5 0.792
#> 5 tslm tslm tslm mode… 0.0854 242. 0.319 0.611
#> 6 arima1 arima ARIMA(2,1… 0.163 539. 0.861 0.958
The function returns a leaderboard table sorting the models by the selected error metric (which in this case was MAPE). In addition to the average MAPE and RMSE error of the models on the testing partitions, the function returns the average coverage rate of the prediction intervals. The coverage rate represents the percentage of actual observations that were bounded between the prediction intervals. As closer the coverage rate to the level rate of the prediction intervals (e.g., 95%), the more reliable the prediction intervals are. For instance, the prediction intervals of the best model, arima2
, had average coverage of 58% and 80% for the corresponding 80% and 95% prediction intervals.
The plot_error
function returns a more detailed view of the models’ error rate on the different testing partition:
plot_error(md)
The plot_model
provides an animated visualization of the tested models’ performance by animating the forecasted values by testing partition:
plot_model(md)
Press the Play button to run the view the animated plot.
The create_model
function represents a set of functions that enable you to add, modify, and remove the different components of the train_model function. Let’s repeat the previous example, forecasting the demand for natural gas in the US, this time using the functional approach.
The following two examples demonstrate the usage of the create_model
function: * The first example demonstrate a step-by-step approach of the create_model
functions, and * The second one will demonstrate a more concise approach using pipes (%>%
).
Step by step approach
First step, create the train_model
object using the create_model
function:
The add_input
function allows you to define the input series for the model:
Next, we will take the same methods we used previously, but this time we will spread them into three separate lists:
ets_methods <- list(ets1 = list(method = "ets",
method_arg = list(opt.crit = "lik"),
notes = "ETS model with opt.crit = lik"),
ets2 = list(method = "ets",
method_arg = list(opt.crit = "amse"),
notes = "ETS model with opt.crit = amse"))
arima_methods <- list(arima1 = list(method = "arima",
method_arg = list(order = c(2,1,0)),
notes = "ARIMA(2,1,0)"),
arima2 = list(method = "arima",
method_arg = list(order = c(2,1,2),
seasonal = list(order = c(1,1,1))),
notes = "SARIMA(2,1,2)(1,1,1)"))
other_methods <- list(hw = list(method = "HoltWinters",
method_arg = NULL,
notes = "HoltWinters Model"),
tslm = list(method = "tslm",
method_arg = list(formula = input ~ trend + season),
notes = "tslm model with trend and seasonal components"))
Next we will use the add_method
function to add those methods to the train_model
object:
md1 <- add_methods(model.obj = md1, methods = ets_methods)
md1 <- add_methods(model.obj = md1, methods = arima_methods)
md1 <- add_methods(model.obj = md1, methods = other_methods)
The remove_methods
, as the name implies, enables you the remove methods from the object. As we saw before, the performance of the arima1
model were bad, so let’s remove it from the object:
md1 <- remove_methods(model.obj = md1, method_ids = "arima1")
The add_train_method defines the train_method argument of the train_model function:
md1 <- add_train_method(model.obj = md1, train_method = list(partitions = 6,
sample.out = 12,
space = 3))
Last but not least, we will set the forecast horizon and the prediction intervals level using the add_horizon and add_level functions, respectively:
md1 <- add_horizon(model.obj = md1, horizon = 12)
md1 <- add_level(model.obj = md1, level = c(90, 95))
Once all the main parameters are defined, the model can be build by using the build_model
function:
fc1 <- build_model(model.obj = md1)
#> # A tibble: 5 x 7
#> model_id model notes avg_mape avg_rmse `avg_coverage_9… `avg_coverage_9…
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 hw HoltW… HoltWinte… 0.0482 144. 0.875 0.931
#> 2 ets1 ets ETS model… 0.0526 156. 0.917 0.972
#> 3 arima2 arima SARIMA(2,… 0.0546 163. 0.736 0.819
#> 4 ets2 ets ETS model… 0.0650 185. 0.722 0.792
#> 5 tslm tslm tslm mode… 0.0854 242. 0.431 0.611
Using the %>% operator
A more concise approach for using the create_model
functionality is by piping the process with the %>%
operatore:
md2 <- create_model() %>%
add_input(input = USgas) %>%
add_methods(methods = ets_methods) %>%
add_methods(methods = arima_methods) %>%
add_methods(methods = other_methods) %>%
remove_methods(method_ids = "arima1") %>%
add_train_method(train_method = list(partitions = 6,
sample.out = 12,
space = 3)) %>%
add_horizon(horizon = 12) %>%
add_level(level = c(90, 95))
Likewise, we will pipe the md2
object to the build_model
function to run the model:
fc2 <- md2 %>% build_model()
#> # A tibble: 5 x 7
#> model_id model notes avg_mape avg_rmse `avg_coverage_9… `avg_coverage_9…
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 hw HoltW… HoltWinte… 0.0482 144. 0.875 0.931
#> 2 ets1 ets ETS model… 0.0526 156. 0.917 0.972
#> 3 arima2 arima SARIMA(2,… 0.0546 163. 0.736 0.819
#> 4 ets2 ets ETS model… 0.0650 185. 0.722 0.792
#> 5 tslm tslm tslm mode… 0.0854 242. 0.431 0.611