Happy to announce the release of TSstudio 0.1.2 to CRAN. The TSstudio package provides tools for descriptive and predictive analysis of time series data, utilizing the visualization enegin of the plotly package and forecasting models from the forecast, forecastHybrid and bsts packages.

Installation

Install the stable version from CRAN:

install.packages("TSstudio")

or install the development version from Github:

# install.packages("devtools")
devtools::install_github("RamiKrispin/TSstudio")

New features

The new release includes new set of functions for forecasting automation with the use of backtesting and ‘horse race’ approach, forecast visualization, quantile plot of time series data, and new datasets.

The new release includes set of new functions for data visualization and as well for forecasting. In addition, there is major improvment in some of the existing functions with the ability to use multiple inputs (ts, xts, zoo, data.frame, data.table, and tbl) and new color palettes.

Backtesting

The ts_backtesting function provides you the ability to train, test and evaluate multiple models with the use of backtesting approach. This allows automating the forecasting process by running a ‘horse race’ between different models or approaches while testing them over multiple periods to evaluate their performance over time. The example below demonstrated the use of the function to forecast the monthly consumption of natural gas in the US for the next five years (or 60 months). By default, the function is testing seven different models, using expended window over six periods:

library(TSstudio)
data("USgas")

ts_info(USgas)
##  The USgas series is a ts object with 1 variable and 235 observations
##  Frequency: 12 
##  Start time: 2000 1 
##  End time: 2019 7
usgas_backtesting <- ts_backtesting(USgas,
                                 periods = 6, # Set the number of periods in the backtesting
                                 window_size = 12, # Set the length of the testing set
                                 h = 60, # Set the horizon of the final forecast
                                 plot = FALSE,
                                 error = "MAPE" # Set the error matrix
                                 )
##    Model_Name   avgMAPE     sdMAPE   avgRMSE    sdRMSE
## 1        bsts 3.7033333 0.29425612 112.54000  9.980495
## 2 HoltWinters 3.9700000 0.38120860 122.53667 12.704470
## 3         ets 5.3083333 2.33877247 153.57667 56.164439
## 4      hybrid 5.4116667 1.74705943 155.81000 41.732575
## 5  auto.arima 5.8383333 0.66360882 172.85667 11.438426
## 6       tbats 6.5283333 1.45424092 186.52667 30.798712
## 7      nnetar 7.9483333 1.99028055 232.59333 55.420409

By default, the model which performed the best in the testing sets, according to the error criteria (RMSE or MAPE), will be selected by the function to forecast the series. In the case of the USgas series, as you can see in the leaderboard above, the auto.arima model achieved the best results and therefore, the function will select this model to forecast the future values:

# Plotting the results
usgas_backtesting$summary_plot

In addition, the output of this function includes all the output from the trained models and their forecasts. For instance, you can check the residuals of the selected model:

check_res(usgas_backtesting$Models_Final$auto.arima)

Or pull the forecast of the hybrid model:

plot_forecast(usgas_backtesting$Forecast_Final$hybrid)

A short video of this function is available here

The ts_seasonal function

The ts_seasonal function is now supported data frame objects (data.frame, data.table, and tbl), in addition to the time series objects (ts, xts, and zoo). The function has three modes, which can define with the type argument:

  • normal - subsetting and plotting the series by its full cycle (or year), this allows identifying if there is a repeated pattern in the series from year to year
ts_seasonal(USgas, type = "normal")
  • cycle - plotting each one of the cycle units over time.
ts_seasonal(USgas, type = "cycle")
  • box - box plot of each cycle unit
ts_seasonal(USgas, type = "box")

Alternatively, you can set the type = "all" to plot all the three options together

ts_seasonal(USgas, type = "all")

In addition, it is possible to modify the palettes of the plot with any of the RColorBrewer and viridis packages palettes. For example, in the plot below, the colors of the normal mode (first plot) is set inferno palette with the palette_normal argument, and the colors of the second and third plots are set to Accent palette with the palette argument:

ts_seasonal(USgas, type = "all", palette_normal = "inferno", palette = "Accent")

Note that the colors in the first plot scaled according to the order of the year, and the colors of the months in the second and third plots aligned to each other.

Quantile plot for time series data

Another new feature is the ts_quantile function for plotting quantile plot of time series data, using different aggregations methods. This function, for now, support only objects with time or date object as index (e.g., xts, zoo, data.frame, data.table, tbl) with a frequency of half-hour and above. In the example below, demonstrate the use of the function with a the UKgrid dataset, which represents the UK national electricity transmission system dataset:

library(UKgrid)

# Exracting the net demand data
nd <- extract_grid(type = "tbl", columns = "ND",start = 2011, end = 2017) 

head(nd)
## # A tibble: 6 x 2
##   TIMESTAMP              ND
##   <dttm>              <int>
## 1 2011-01-01 00:00:00 34606
## 2 2011-01-01 00:30:00 35092
## 3 2011-01-01 01:00:00 34725
## 4 2011-01-01 01:30:00 33649
## 5 2011-01-01 02:00:00 32644
## 6 2011-01-01 02:30:00 32092
ts_quantile(nd)

By default, the function plot the quantile of the series according to the frequency units of the series. In the case of the UKgrid dataset above, since the series has a frequency of half-hour (or 48), the plot represents 48 quantiles, one for each half-hour of the day. The period argument provides a subset view of the series using an upper-level frequency (e.g., for half-hourly frequency a view by day of the week, month, quarter or year). This could be very useful when multi seasonality patterns exist in the data. For example, you can view the quantile of the series by the day of the week:

ts_quantile(nd, 
            period = "weekdays", 
            title = "UK Natioanl Grid Net Demand by Weekdays",
            n = 2 # Set the number of rows in the plot
            )

Heatmap

The ts_heatmap funcation is now supports multiple time series classes, including data frame objects(as long as there is a time or data column in the data), supporting data with daily frequency and lower (weekly, monthly, etc.):

UKgrid_daily <- extract_grid(type = "data.frame", 
                             columns = "ND",
                             aggregate = "daily")

head(UKgrid_daily)
##    TIMESTAMP      ND
## 1 2005-04-01 1920069
## 2 2005-04-02 1674699
## 3 2005-04-03 1631352
## 4 2005-04-04 1916693
## 5 2005-04-05 1952082
## 6 2005-04-06 1964584
ts_heatmap(UKgrid_daily)

Similarly to the ts_seasonl function, the ts_heatmap supprorting color palettes from the RColorBrewer and viridis packages:

ts_heatmap(USgas, color = "Reds")

The ts_lags function

The ts_lags function, is now allowing to select a set of lags (either a sequence of specific lags). By default the function is plotting the first 12 lags:

ts_lags(USgas)

Alternatively, you can select a specific set of lags, such as the seasonal lags:

ts_lags(USgas, lags = c(12, 24, 36, 48))

Road map

The scope of the current release was on forecasting automation, extending the support of the package functions for additional data inputs (such as data.frame, data.table, and tbl).

The focus of the next release is mainly on:

  • Automation of the forecasting process - extending the functionality of the ts_backtesting function
  • Extending the support of the existing function (when applicable) for the Facebook prophet model and the new time series object tsibble