Forecasting Applications

The TSstudio package provides a set of functions for train, test, and evaluate time series forecasting models, this vignette covers the main ones.

Creating testing and training partitons

The ts_split function split time series data into training (sample-in) and testing (sample-out) partitions, keeping the chronological order of the series. The sample.out argument defines the length of the testing partition. In the following example, we will use the function to split the USgas series (the monthly demand for natural gas in the US) into training and testing partitions, leaving the last 12 months as testing partition:

library(TSstudio)

data(USgas)

ts_info(USgas)
#>  The USgas series is a ts object with 1 variable and 238 observations
#>  Frequency: 12 
#>  Start time: 2000 1 
#>  End time: 2019 10

USgas_par <- ts_split(USgas, sample.out = 12)

train <- USgas_par$train

test <- USgas_par$test

ts_info(train)
#>  The train series is a ts object with 1 variable and 226 observations
#>  Frequency: 12 
#>  Start time: 2000 1 
#>  End time: 2018 10
ts_info(test)
#>  The test series is a ts object with 1 variable and 12 observations
#>  Frequency: 12 
#>  Start time: 2018 11 
#>  End time: 2019 10

Visualize forecast performance

The test_forecast function visualizes the performance of a forecasting model on the training and testing partitions by plotting the fitted and forecasted values against the actuals. The function supports forecast models from the forecast and stats packages such as auto.arima, tslm, ets, HoltWinters, etc.

For example, let’s train a tslm model on the train partition and forecast the corresponding observations of the test partition:

library(forecast)

md <- tslm(train ~ season + trend)

fc <- forecast(md, h = 12)

Using the actual series (USgas), forecast object (fc), and the corresponding testing partition (test), the function will plot the fitted and forecasted values against the actuals:

test_forecast(actual = USgas,
              forecast.obj = fc,
              test = test)

The MAPE and RMSE score of both the fitted and forecasted values can be seen when hovering on the plot.

Training multiple time series model with backtesting

The train_model function replaces the ts_backtesting function, which will deprecate on the next version. The function provides a flexible framework for forecasting. The use of the function is for train, test, and evaluate multiple models (or different versions of those models) with the use of a backtesting approach. The backtesting approach enables you to evaluate the performance of the model over time (as opposed to single training and testing partitions) and select a model that performed best using some error criteria (e.g., MAPE or RMSE).

The train_model main arguments

The main arguments of this train_model function are:

methods - defines the type of models and their setting to be used on the training process. Currently, it supports the following models:
arima - model from the stats package
auto.arima - model from the forecast package
ets - model from the forecast package
HoltWinters - model from the stats package
nnetar - model from the forecast package
tslm - model from the forecast package (note that the ‘tslm’ model must have the formula argument in the ‘method_arg’ argument)

The method argument defines as a list and using the following structure:


list(model_1_id = list(method = "[type of model]",
                       method_arg = list("[model 1 arguments]"),
                       notes = "[set notes for model 1]"),
     .
     .
     .
     model_n_id = list(method = "[type of model]",
                       method_arg = list("[model n arguments]"),
                       notes = "[set notes for model n]"))

Where each element of the list defines the specific model and its setting. The name of the element represents the model ID, and it has three arguments: method - the type of model to use (one of the listed models above) method_arg - a list (optional), enables you to modify the default setting of the model argument notes - a character (optional), allows you to set notes about the model. This useful when testing multiple versions of the same models

train_method - a list, set the window structure of the backtesting process, where:
partitions - set the number of partitions to use on the backtesting
sample.out - set the length of the testing partitions
space - defines the expansion length of the backtesting window

Note: by default, the backtesting algorithm is using the expending window approach. Alternatively, it is possible to use a sliding window approach by setting the sample.in argument.

The following example demonstrated the use of the train_model function to forecast the demand for natural gas in the next 12 months. Let’s first defines the forecasting methods:

methods <- list(ets1 = list(method = "ets",
                            method_arg = list(opt.crit = "lik"),
                            notes = "ETS model with opt.crit = lik"),
                ets2 = list(method = "ets",
                            method_arg = list(opt.crit = "amse"),
                            notes = "ETS model with opt.crit = amse"),
                arima1 = list(method = "arima",
                              method_arg = list(order = c(2,1,0)),
                              notes = "ARIMA(2,1,0)"),
                arima2 = list(method = "arima",
                              method_arg = list(order = c(2,1,2),
                                                seasonal = list(order = c(1,1,1))),
                              notes = "SARIMA(2,1,2)(1,1,1)"),
                hw = list(method = "HoltWinters",
                          method_arg = NULL,
                          notes = "HoltWinters Model"),
                tslm = list(method = "tslm",
                            method_arg = list(formula = input ~ trend + season),
                            notes = "tslm model with trend and seasonal components"))

Similarly, we will define the window structure of the backtesting by using:

Six testing partitions
Each partition with a length of 12 observations, and
Expending rate of the backtesting window of 3 observations

train_method = list(partitions = 6,
                    sample.out = 12,
                    space = 3)

We will now set the train_model function, using the methods and train_method lists to set the corresponding arguments of the function. The horizon argument defines the length of the forecast, and the error argument defines the error metric to use to evaluate the performance of the models:

md <- train_model(input = USgas,
                  methods = methods,
                  train_method = train_method,
                  horizon = 12,
                  error = "MAPE")
#> # A tibble: 6 x 7
#>   model_id model  notes      avg_mape avg_rmse `avg_coverage_8… `avg_coverage_9…
#>   <chr>    <chr>  <chr>         <dbl>    <dbl>            <dbl>            <dbl>
#> 1 hw       HoltW… HoltWinte…   0.0482     144.            0.792            0.931
#> 2 ets1     ets    ETS model…   0.0526     156.            0.833            0.972
#> 3 arima2   arima  SARIMA(2,…   0.0546     163.            0.583            0.819
#> 4 ets2     ets    ETS model…   0.0650     185.            0.5              0.792
#> 5 tslm     tslm   tslm mode…   0.0854     242.            0.319            0.611
#> 6 arima1   arima  ARIMA(2,1…   0.163      539.            0.861            0.958

The function returns a leaderboard table sorting the models by the selected error metric (which in this case was MAPE). In addition to the average MAPE and RMSE error of the models on the testing partitions, the function returns the average coverage rate of the prediction intervals. The coverage rate represents the percentage of actual observations that were bounded between the prediction intervals. As closer the coverage rate to the level rate of the prediction intervals (e.g., 95%), the more reliable the prediction intervals are. For instance, the prediction intervals of the best model, arima2, had average coverage of 58% and 80% for the corresponding 80% and 95% prediction intervals.

Visualizing the models’ performance

The plot_error function returns a more detailed view of the models’ error rate on the different testing partition:

plot_error(md)

Where the left plot represents the models’ performance by testing partitions, and the right side represents the error spread of each model.

The plot_model provides an animated visualization of the tested models’ performance by animating the forecasted values by testing partition:

plot_model(md)

Press the Play button to run the view the animated plot.

A functional approach for building the train_mode components

The create_model function represents a set of functions that enable you to add, modify, and remove the different components of the train_model function. Let’s repeat the previous example, forecasting the demand for natural gas in the US, this time using the functional approach.

The following two examples demonstrate the usage of the create_model function: * The first example demonstrate a step-by-step approach of the create_model functions, and * The second one will demonstrate a more concise approach using pipes (%>%).

Step by step approach

First step, create the train_model object using the create_model function:

md1 <- create_model()

class(md1)
#> [1] "train_model"

The add_input function allows you to define the input series for the model:

md1 <- add_input(model.obj = md1, input = USgas)

Next, we will take the same methods we used previously, but this time we will spread them into three separate lists:

ets_methods <- list(ets1 = list(method = "ets",
                                method_arg = list(opt.crit = "lik"),
                                notes = "ETS model with opt.crit = lik"),
                    ets2 = list(method = "ets",
                                method_arg = list(opt.crit = "amse"),
                                notes = "ETS model with opt.crit = amse"))
arima_methods <- list(arima1 = list(method = "arima",
                                    method_arg = list(order = c(2,1,0)),
                                    notes = "ARIMA(2,1,0)"),
                      arima2 = list(method = "arima",
                                    method_arg = list(order = c(2,1,2),
                                                      seasonal = list(order = c(1,1,1))),
                                    notes = "SARIMA(2,1,2)(1,1,1)"))

other_methods <- list(hw = list(method = "HoltWinters",
                                method_arg = NULL,
                                notes = "HoltWinters Model"),
                      tslm = list(method = "tslm",
                                  method_arg = list(formula = input ~ trend + season),
                                  notes = "tslm model with trend and seasonal components"))

Next we will use the add_method function to add those methods to the train_model object:

md1 <- add_methods(model.obj = md1, methods = ets_methods)
md1 <- add_methods(model.obj = md1, methods = arima_methods)
md1 <- add_methods(model.obj = md1, methods = other_methods)

The remove_methods, as the name implies, enables you the remove methods from the object. As we saw before, the performance of the arima1 model were bad, so let’s remove it from the object:

md1 <- remove_methods(model.obj = md1, method_ids = "arima1")

The add_train_method defines the train_method argument of the train_model function:

md1 <- add_train_method(model.obj = md1, train_method = list(partitions = 6, 
                                                           sample.out = 12, 
                                                           space = 3))

Last but not least, we will set the forecast horizon and the prediction intervals level using the add_horizon and add_level functions, respectively:

md1 <- add_horizon(model.obj = md1, horizon = 12)

md1 <- add_level(model.obj = md1, level = c(90, 95))

Once all the main parameters are defined, the model can be build by using the build_model function:

fc1 <- build_model(model.obj = md1)
#> # A tibble: 5 x 7
#>   model_id model  notes      avg_mape avg_rmse `avg_coverage_9… `avg_coverage_9…
#>   <chr>    <chr>  <chr>         <dbl>    <dbl>            <dbl>            <dbl>
#> 1 hw       HoltW… HoltWinte…   0.0482     144.            0.875            0.931
#> 2 ets1     ets    ETS model…   0.0526     156.            0.917            0.972
#> 3 arima2   arima  SARIMA(2,…   0.0546     163.            0.736            0.819
#> 4 ets2     ets    ETS model…   0.0650     185.            0.722            0.792
#> 5 tslm     tslm   tslm mode…   0.0854     242.            0.431            0.611

Using the %>% operator

A more concise approach for using the create_model functionality is by piping the process with the %>% operatore:

md2 <- create_model() %>%
  add_input(input = USgas) %>%
  add_methods(methods = ets_methods) %>%
  add_methods(methods = arima_methods) %>%
  add_methods(methods = other_methods) %>%
  remove_methods(method_ids = "arima1") %>%
  add_train_method(train_method = list(partitions = 6, 
                                       sample.out = 12, 
                                       space = 3)) %>%
  add_horizon(horizon = 12) %>%
  add_level(level = c(90, 95))

Likewise, we will pipe the md2 object to the build_model function to run the model: