Time series data describes a sequence of observations that were captured over time. Each data point of the series is associated with a unique timestamp. Typically, the term time series (or series) refers to regular time series data, where:
In this workshop, we will focus only on regular time series data.
R provides a variety of time series classes and objects. The most common classes are:
ts
- R built-in class for time-series objects.It is part of the stats package time series analysis and forecasting eco-system. The structure of this class is vector and matrix for mts
objects (multiple time series) xts
/zoo
- flexible format for handling both regular and irregular time series data in a table format. Mainly used for finance applications tsibble
- a tidy data format for time series data, fairly similar to the xts/zoo packagesIn this workshop, we will focus on the tsibble
class.
data("AirPassengers")
class(AirPassengers)
## [1] "ts"
AirPassengers
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
start(AirPassengers)
## [1] 1949 1
end(AirPassengers)
## [1] 1960 12
time(AirPassengers)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 1949.00 1949.08 1949.17 1949.25 1949.33 1949.42 1949.50 1949.58 1949.67 1949.75 1949.83 1949.92
## 1950 1950.00 1950.08 1950.17 1950.25 1950.33 1950.42 1950.50 1950.58 1950.67 1950.75 1950.83 1950.92
## 1951 1951.00 1951.08 1951.17 1951.25 1951.33 1951.42 1951.50 1951.58 1951.67 1951.75 1951.83 1951.92
## 1952 1952.00 1952.08 1952.17 1952.25 1952.33 1952.42 1952.50 1952.58 1952.67 1952.75 1952.83 1952.92
## 1953 1953.00 1953.08 1953.17 1953.25 1953.33 1953.42 1953.50 1953.58 1953.67 1953.75 1953.83 1953.92
## 1954 1954.00 1954.08 1954.17 1954.25 1954.33 1954.42 1954.50 1954.58 1954.67 1954.75 1954.83 1954.92
## 1955 1955.00 1955.08 1955.17 1955.25 1955.33 1955.42 1955.50 1955.58 1955.67 1955.75 1955.83 1955.92
## 1956 1956.00 1956.08 1956.17 1956.25 1956.33 1956.42 1956.50 1956.58 1956.67 1956.75 1956.83 1956.92
## 1957 1957.00 1957.08 1957.17 1957.25 1957.33 1957.42 1957.50 1957.58 1957.67 1957.75 1957.83 1957.92
## 1958 1958.00 1958.08 1958.17 1958.25 1958.33 1958.42 1958.50 1958.58 1958.67 1958.75 1958.83 1958.92
## 1959 1959.00 1959.08 1959.17 1959.25 1959.33 1959.42 1959.50 1959.58 1959.67 1959.75 1959.83 1959.92
## 1960 1960.00 1960.08 1960.17 1960.25 1960.33 1960.42 1960.50 1960.58 1960.67 1960.75 1960.83 1960.92
deltat(AirPassengers)
## [1] 0.0833333
frequency(AirPassengers)
## [1] 12
cycle(AirPassengers)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 1 2 3 4 5 6 7 8 9 10 11 12
## 1950 1 2 3 4 5 6 7 8 9 10 11 12
## 1951 1 2 3 4 5 6 7 8 9 10 11 12
## 1952 1 2 3 4 5 6 7 8 9 10 11 12
## 1953 1 2 3 4 5 6 7 8 9 10 11 12
## 1954 1 2 3 4 5 6 7 8 9 10 11 12
## 1955 1 2 3 4 5 6 7 8 9 10 11 12
## 1956 1 2 3 4 5 6 7 8 9 10 11 12
## 1957 1 2 3 4 5 6 7 8 9 10 11 12
## 1958 1 2 3 4 5 6 7 8 9 10 11 12
## 1959 1 2 3 4 5 6 7 8 9 10 11 12
## 1960 1 2 3 4 5 6 7 8 9 10 11 12
The tsibble package provides a tidy data structure and tools for handling regular and irregular time series data. The tsibble
class enables to store single and multiple time series objects in a data.frame
/ tbl
format with time awareness attributes.
The tsibble
class has a table structure with time awareness. Its main attributes are:
As the tsibble
class is based on tbl
/ data.frame
classes, it inherits their attributes and functionalities. Hence, dplyr
or other utility tools that can be used with tbl
and data.frame
objects can be used with tsibble
objects
index
- a tsibble
object has an index
column that defines the timestamp or intervals of the series. The index
could have the following date/time classes:
Interval | Class |
---|---|
Annual | integer /double |
Quarterly | yearquarter |
Monthly | yearmonth |
Weekly | yearweek |
Daily | Date /difftime |
Subdaily | POSIXt /difftime /hms |
Table source: the tsibble package README file
A tsibble
object can represent a multiple time series object in a wide or long format:
key
- optional, enables to represent multiple time series in a long format, by using single or multiple columns as series key/s
The as_tsibble
function enables to convert data.frame or ts objects into a tsibble object. In the following example, we will load the Natural Gas Consumption series. This series represents the monthly consumption of natural gas in the US since January 2000:
library(tsibble)
library(dplyr)
naturalgas_path <- paste(rprojroot::find_rstudio_root_file(), "data", "NATURALGAS.csv", sep = "/")
us_gas <- read.csv(naturalgas_path, stringsAsFactors = FALSE) %>%
setNames(c("date", "y"))
head(us_gas)
## date y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
str(us_gas)
## 'data.frame': 248 obs. of 2 variables:
## $ date: chr "2000-01-01" "2000-02-01" "2000-03-01" "2000-04-01" ...
## $ y : num 2510 2331 2051 1783 1633 ...
This series has two columns:
date
- the timestamp of the observationsy
- the series valuesWe will reformat the date
column from character
into Date
and use this column as the series index
:
us_gas$date <- as.Date(us_gas$date) %>% yearmonth()
us_gas_tsibble <- us_gas %>%
as_tsibble(index = "date")
class(us_gas_tsibble)
## [1] "tbl_ts" "tbl_df" "tbl" "data.frame"
head(us_gas_tsibble)
## # A tsibble: 6 x 2 [1M]
## date y
## <mth> <dbl>
## 1 2000 Jan 2510.
## 2 2000 Feb 2331.
## 3 2000 Mar 2051.
## 4 2000 Apr 1783.
## 5 2000 May 1633.
## 6 2000 Jun 1513.
As you can see, the tsibble
class has tbl
alike print
method, with additional information about the series intervals, which is monthly. Let’s explore the new object attributes:
interval(us_gas_tsibble)
## <interval[1]>
## [1] 1M
index(us_gas_tsibble)
## date
key(us_gas_tsibble)
## list()
Since we did not define a key
, the object does not have a key.
The as_tsibble
function enables to convet ts
object to tsibble
. We will use the function to convert the AirPassegers
from a ts
object to a tsibble
object:
ap_tsibble <- AirPassengers %>% as_tsibble()
class(ap_tsibble)
## [1] "tbl_ts" "tbl_df" "tbl" "data.frame"
head(ap_tsibble)
## # A tsibble: 6 x 2 [1M]
## index value
## <mth> <dbl>
## 1 1949 Jan 112
## 2 1949 Feb 118
## 3 1949 Mar 132
## 4 1949 Apr 129
## 5 1949 May 121
## 6 1949 Jun 135
interval(ap_tsibble)
## <interval[1]>
## [1] 1M
The function convert the ts object time vector into an index object.
Note: as the time vector of the ts
object can represent only two components of the timestamp (e.g., year and month, year and day, etc.), the transformation of the vector into a three or four components timestamp index (such as Date
or other classes) might be ambiguous in some cases.
The tourism
dataset, from the tsibble package, is an example of multiple time series object in a long format. This series descibe the number of overnight trips per quarter across Australia by region, state and purpose:
library(tsibble)
data("tourism")
class(tourism)
## [1] "tbl_ts" "tbl_df" "tbl" "data.frame"
head(tourism)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
Where in this case, the Quarter
column defines the series index
and the Region
, State
, and Purpose
columns defines the series key
:
index(tourism)
## Quarter
key(tourism)
## [[1]]
## Region
##
## [[2]]
## State
##
## [[3]]
## Purpose
key_vars(tourism)
## [1] "Region" "State" "Purpose"
key_data(tourism) %>% head()
## # A tibble: 6 x 4
## Region State Purpose .rows
## <chr> <chr> <chr> <list<int>>
## 1 Adelaide South Australia Business [80]
## 2 Adelaide South Australia Holiday [80]
## 3 Adelaide South Australia Other [80]
## 4 Adelaide South Australia Visiting [80]
## 5 Adelaide Hills South Australia Business [80]
## 6 Adelaide Hills South Australia Holiday [80]
This one of the cases the interaction between dplyr and tsibble is super useful:
sydney <- tourism %>% filter(State == "New South Wales",
Region == "Sydney",
Purpose == "Holiday")
head(sydney)
## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Sydney New South Wales Holiday 828.
## 2 1998 Q2 Sydney New South Wales Holiday 531.
## 3 1998 Q3 Sydney New South Wales Holiday 503.
## 4 1998 Q4 Sydney New South Wales Holiday 580.
## 5 1999 Q1 Sydney New South Wales Holiday 465.
## 6 1999 Q2 Sydney New South Wales Holiday 534.
key_data(sydney)
## # A tibble: 1 x 4
## Region State Purpose .rows
## * <chr> <chr> <chr> <list<int>>
## 1 Sydney New South Wales Holiday [80]
Let’s practice what we learned so far:
data
folder), reformat the timesampt to yearmonth
and convert to tsibble
object with the as_tsibble
functionus_abs_path <- paste(rprojroot::find_rstudio_root_file(), "data", "Alcoholic Beverages Sales.csv", sep = "/")
us_abs <- read.csv(naturalgas_path, stringsAsFactors = FALSE) %>%
setNames(c("date", "y"))
head(us_abs)
## date y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
ca_elec.rda
file (also in data
folder) and convert the data.frame
object to tsibble
object, where the key
of the series should be the operator
columnca_elec_path <- paste(rprojroot::find_rstudio_root_file(), "data", "ca_elec.rda", sep = "/")
load(ca_elec_path)
head(ca_elec)
## date_time operator y
## 1 2018-07-01 08:00:00 Pacific Gas and Electric 12522
## 2 2018-07-01 09:00:00 Pacific Gas and Electric 11745
## 3 2018-07-01 10:00:00 Pacific Gas and Electric 11200
## 4 2018-07-01 11:00:00 Pacific Gas and Electric 10822
## 5 2018-07-01 12:00:00 Pacific Gas and Electric 10644
## 6 2018-07-01 13:00:00 Pacific Gas and Electric 10559