One of the limitations of the EIA API is the 5000 observations limit per call. This could be challenging if you are trying to pull hourly time series, which is roughly 26280 observations per year. The eia_backfill function solves this issue and removes the API number of observations per call limit. On the backend, the function splits the query into multiple sequential queries, pulls the data, and returns an append object.

For example, let’s pull hourly generation of electricity by New York Independent System Operator on sub-region West using the eia_backfill function:

library(EIAapi)

 start <- as.POSIXlt("2018-06-19T00", tz = "UTC")
 end <- lubridate::floor_date(Sys.time()- lubridate::days(2), unit = "day")
 attr(end, "tzone") <- "UTC"
 offset <- 5000
 api_key <- Sys.getenv("eia_key")
 api_path <- "electricity/rto/region-sub-ba-data/data/"

 facets = list(parent = "NYIS",
               subba = "ZONA")
 
  df <- eia_backfill(start = start,
                end = end,
                offset = offset,
                api_key = api_key,
                api_path = api_path,
                facets = facets)

As you can see below, the return series has more than 45,000 observations:

head(df)
#>                  time subba subba_name parent
#> 1 2018-06-19 05:00:00  ZONA       West   NYIS
#> 2 2018-06-19 06:00:00  ZONA       West   NYIS
#> 3 2018-06-19 07:00:00  ZONA       West   NYIS
#> 4 2018-06-19 08:00:00  ZONA       West   NYIS
#> 5 2018-06-19 09:00:00  ZONA       West   NYIS
#> 6 2018-06-19 10:00:00  ZONA       West   NYIS
#>                            parent_name value   value_units
#> 1 New York Independent System Operator  1848 megawatthours
#> 2 New York Independent System Operator  1754 megawatthours
#> 3 New York Independent System Operator  1699 megawatthours
#> 4 New York Independent System Operator  1650 megawatthours
#> 5 New York Independent System Operator  1640 megawatthours
#> 6 New York Independent System Operator  1673 megawatthours

nrow(df)
#> [1] 45074

at_y <- pretty(df$value)[c(2, 4, 6)]
 at_x <- seq.POSIXt(from = start,
                  to = end,
                  by = "2 years")
 plot(df$time, df$value,
      col = "#1f77b4",
      type = "l",
      frame.plot = FALSE,
      axes = FALSE,
      panel.first = abline(h = at_y, col = "grey80"),
      main = "NY Independent System Operator (West) - Hourly Generation of Electricity",
      xlab = "Source: https://www.eia.gov/",
      ylab = "MegaWatt/Hours")

 mtext(side =1, text = format(at_x, format = "%Y"), at = at_x,
       col = "grey20", line = 1, cex = 0.8)

 mtext(side =2, text = format(at_y, scientific = FALSE), at = at_y,
       col = "grey20", line = 1, cex = 0.8)