Time series data describes a sequence of observations that were captured over time. Each data point of the series is associated with a unique timestamp. Typically, the term time series (or series) refers to regular time series data, where:

In this workshop, we will focus only on regular time series data.

R provides a variety of time series classes and objects. The most common classes are:

In this workshop, we will focus on the tsibble class.

The ts class

data("AirPassengers")

class(AirPassengers)
## [1] "ts"
AirPassengers
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
start(AirPassengers)
## [1] 1949    1
end(AirPassengers)
## [1] 1960   12
time(AirPassengers)
##          Jan     Feb     Mar     Apr     May     Jun     Jul     Aug     Sep     Oct     Nov     Dec
## 1949 1949.00 1949.08 1949.17 1949.25 1949.33 1949.42 1949.50 1949.58 1949.67 1949.75 1949.83 1949.92
## 1950 1950.00 1950.08 1950.17 1950.25 1950.33 1950.42 1950.50 1950.58 1950.67 1950.75 1950.83 1950.92
## 1951 1951.00 1951.08 1951.17 1951.25 1951.33 1951.42 1951.50 1951.58 1951.67 1951.75 1951.83 1951.92
## 1952 1952.00 1952.08 1952.17 1952.25 1952.33 1952.42 1952.50 1952.58 1952.67 1952.75 1952.83 1952.92
## 1953 1953.00 1953.08 1953.17 1953.25 1953.33 1953.42 1953.50 1953.58 1953.67 1953.75 1953.83 1953.92
## 1954 1954.00 1954.08 1954.17 1954.25 1954.33 1954.42 1954.50 1954.58 1954.67 1954.75 1954.83 1954.92
## 1955 1955.00 1955.08 1955.17 1955.25 1955.33 1955.42 1955.50 1955.58 1955.67 1955.75 1955.83 1955.92
## 1956 1956.00 1956.08 1956.17 1956.25 1956.33 1956.42 1956.50 1956.58 1956.67 1956.75 1956.83 1956.92
## 1957 1957.00 1957.08 1957.17 1957.25 1957.33 1957.42 1957.50 1957.58 1957.67 1957.75 1957.83 1957.92
## 1958 1958.00 1958.08 1958.17 1958.25 1958.33 1958.42 1958.50 1958.58 1958.67 1958.75 1958.83 1958.92
## 1959 1959.00 1959.08 1959.17 1959.25 1959.33 1959.42 1959.50 1959.58 1959.67 1959.75 1959.83 1959.92
## 1960 1960.00 1960.08 1960.17 1960.25 1960.33 1960.42 1960.50 1960.58 1960.67 1960.75 1960.83 1960.92
deltat(AirPassengers)
## [1] 0.0833333
frequency(AirPassengers)
## [1] 12
cycle(AirPassengers)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949   1   2   3   4   5   6   7   8   9  10  11  12
## 1950   1   2   3   4   5   6   7   8   9  10  11  12
## 1951   1   2   3   4   5   6   7   8   9  10  11  12
## 1952   1   2   3   4   5   6   7   8   9  10  11  12
## 1953   1   2   3   4   5   6   7   8   9  10  11  12
## 1954   1   2   3   4   5   6   7   8   9  10  11  12
## 1955   1   2   3   4   5   6   7   8   9  10  11  12
## 1956   1   2   3   4   5   6   7   8   9  10  11  12
## 1957   1   2   3   4   5   6   7   8   9  10  11  12
## 1958   1   2   3   4   5   6   7   8   9  10  11  12
## 1959   1   2   3   4   5   6   7   8   9  10  11  12
## 1960   1   2   3   4   5   6   7   8   9  10  11  12

The tsibble class

The tsibble package provides a tidy data structure and tools for handling regular and irregular time series data. The tsibble class enables to store single and multiple time series objects in a data.frame / tbl format with time awareness attributes.

The tsibble attributes

The tsibble class has a table structure with time awareness. Its main attributes are:

  • As the tsibble class is based on tbl / data.frame classes, it inherits their attributes and functionalities. Hence, dplyr or other utility tools that can be used with tbl and data.frame objects can be used with tsibble objects

  • index - a tsibble object has an index column that defines the timestamp or intervals of the series. The index could have the following date/time classes:

Interval Class
Annual integer/double
Quarterly yearquarter
Monthly yearmonth
Weekly yearweek
Daily Date/difftime
Subdaily POSIXt/difftime/hms

Table source: the tsibble package README file

  • A tsibble object can represent a multiple time series object in a wide or long format:

    • Wide-format - each column represents a series, and every row share the same index
    • Long-format - with the use of key column/s
  • key - optional, enables to represent multiple time series in a long format, by using single or multiple columns as series key/s

Creating a tsibble object

The as_tsibble function enables to convert data.frame or ts objects into a tsibble object. In the following example, we will load the Natural Gas Consumption series. This series represents the monthly consumption of natural gas in the US since January 2000:

library(tsibble)
library(dplyr)

naturalgas_path <- paste(rprojroot::find_rstudio_root_file(), "data", "NATURALGAS.csv", sep = "/")

us_gas <- read.csv(naturalgas_path, stringsAsFactors = FALSE) %>%
  setNames(c("date", "y")) 

head(us_gas)
##         date      y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
str(us_gas)
## 'data.frame':    248 obs. of  2 variables:
##  $ date: chr  "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ...
##  $ y   : num  2510 2331 2051 1783 1633 ...

This series has two columns:

  • date - the timestamp of the observations
  • y - the series values

We will reformat the date column from character into Date and use this column as the series index:

us_gas$date <- as.Date(us_gas$date) %>% yearmonth()

us_gas_tsibble <- us_gas %>% 
  as_tsibble(index = "date")

class(us_gas_tsibble)
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
head(us_gas_tsibble)
## # A tsibble: 6 x 2 [1M]
##       date     y
##      <mth> <dbl>
## 1 2000 Jan 2510.
## 2 2000 Feb 2331.
## 3 2000 Mar 2051.
## 4 2000 Apr 1783.
## 5 2000 May 1633.
## 6 2000 Jun 1513.

As you can see, the tsibble class has tbl alike print method, with additional information about the series intervals, which is monthly. Let’s explore the new object attributes:

interval(us_gas_tsibble)
## <interval[1]>
## [1] 1M
index(us_gas_tsibble)
## date
key(us_gas_tsibble)
## list()

Since we did not define a key, the object does not have a key.

Converting a ts object into a tsibble

The as_tsibble function enables to convet ts object to tsibble. We will use the function to convert the AirPassegers from a ts object to a tsibble object:

ap_tsibble <- AirPassengers %>% as_tsibble()

class(ap_tsibble)
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
head(ap_tsibble)
## # A tsibble: 6 x 2 [1M]
##      index value
##      <mth> <dbl>
## 1 1949 Jan   112
## 2 1949 Feb   118
## 3 1949 Mar   132
## 4 1949 Apr   129
## 5 1949 May   121
## 6 1949 Jun   135
interval(ap_tsibble)
## <interval[1]>
## [1] 1M

The function convert the ts object time vector into an index object.

Note: as the time vector of the ts object can represent only two components of the timestamp (e.g., year and month, year and day, etc.), the transformation of the vector into a three or four components timestamp index (such as Date or other classes) might be ambiguous in some cases.

Multiple time series object

The tourism dataset, from the tsibble package, is an example of multiple time series object in a long format. This series descibe the number of overnight trips per quarter across Australia by region, state and purpose:

library(tsibble)

data("tourism")

class(tourism)
## [1] "tbl_ts"     "tbl_df"     "tbl"        "data.frame"
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region   State           Purpose  Trips
##     <qtr> <chr>    <chr>           <chr>    <dbl>
## 1 1998 Q1 Adelaide South Australia Business  135.
## 2 1998 Q2 Adelaide South Australia Business  110.
## 3 1998 Q3 Adelaide South Australia Business  166.
## 4 1998 Q4 Adelaide South Australia Business  127.
## 5 1999 Q1 Adelaide South Australia Business  137.
## 6 1999 Q2 Adelaide South Australia Business  200.

Where in this case, the Quarter column defines the series index and the Region, State, and Purpose columns defines the series key:

index(tourism)
## Quarter
key(tourism)
## [[1]]
## Region
## 
## [[2]]
## State
## 
## [[3]]
## Purpose
key_vars(tourism)
## [1] "Region"  "State"   "Purpose"
key_data(tourism) %>% head()
## # A tibble: 6 x 4
##   Region         State           Purpose        .rows
##   <chr>          <chr>           <chr>    <list<int>>
## 1 Adelaide       South Australia Business        [80]
## 2 Adelaide       South Australia Holiday         [80]
## 3 Adelaide       South Australia Other           [80]
## 4 Adelaide       South Australia Visiting        [80]
## 5 Adelaide Hills South Australia Business        [80]
## 6 Adelaide Hills South Australia Holiday         [80]

This one of the cases the interaction between dplyr and tsibble is super useful:

sydney <- tourism %>% filter(State == "New South Wales",
                             Region == "Sydney",
                             Purpose == "Holiday")

head(sydney)
## # A tsibble: 6 x 5 [1Q]
## # Key:       Region, State, Purpose [1]
##   Quarter Region State           Purpose Trips
##     <qtr> <chr>  <chr>           <chr>   <dbl>
## 1 1998 Q1 Sydney New South Wales Holiday  828.
## 2 1998 Q2 Sydney New South Wales Holiday  531.
## 3 1998 Q3 Sydney New South Wales Holiday  503.
## 4 1998 Q4 Sydney New South Wales Holiday  580.
## 5 1999 Q1 Sydney New South Wales Holiday  465.
## 6 1999 Q2 Sydney New South Wales Holiday  534.
key_data(sydney)
## # A tibble: 1 x 4
##   Region State           Purpose       .rows
## * <chr>  <chr>           <chr>   <list<int>>
## 1 Sydney New South Wales Holiday        [80]

Exercises

Let’s practice what we learned so far:

us_abs_path <- paste(rprojroot::find_rstudio_root_file(), "data", "Alcoholic Beverages Sales.csv", sep = "/")

us_abs <- read.csv(naturalgas_path, stringsAsFactors = FALSE) %>%
  setNames(c("date", "y")) 

head(us_abs)
##         date      y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
ca_elec_path <- paste(rprojroot::find_rstudio_root_file(), "data", "ca_elec.rda", sep = "/")

load(ca_elec_path)

head(ca_elec)
##             date_time                 operator     y
## 1 2018-07-01 08:00:00 Pacific Gas and Electric 12522
## 2 2018-07-01 09:00:00 Pacific Gas and Electric 11745
## 3 2018-07-01 10:00:00 Pacific Gas and Electric 11200
## 4 2018-07-01 11:00:00 Pacific Gas and Electric 10822
## 5 2018-07-01 12:00:00 Pacific Gas and Electric 10644
## 6 2018-07-01 13:00:00 Pacific Gas and Electric 10559