The U.S. Energy Information Administration (EIA) hosts a variety of time series data describing the U.S. energy sector. The EIA data is open and accessible through an Application Programming Interface (API) for free. The EIAapi package provides functions to query and pull tidy data from the EIA API v2.

Prerequisites

To pull data from the API using this package, you will need the following:

  • jq - The package uses jq to parse the API output from JSON to tabular format. To download and install jq follow the instructions on the download page.
  • API key - To query the EIA API, you must register to the service to receive the API key.

To register to the API, go to https://www.eia.gov/opendata/, click the Register button, and follow the instructions (see the screenshot below).


Getting started with the EIAapi

The EIAapi package provides the following functions:

  • eia_metadata - Returns information and metadata from the API about the available categories and subcategories (routes)
  • eia_get - Enables query data from the API

A good place to start exploring the available categories and subcategories of the API is on the API Dashboard. For example, let’s explore the hourly demand for electricity in the California sub-region in the dashboard and use the metadata to set the query parameters. Let’s start by set the API Route to Electricity -> Electric Power Operrations (Daily And Hourly) -> Hourly Demand Subregion, as shown in the screenshot below. Once the data route is set, define the PARENT facet as CISO. This will filter the options in the next facet - SUBBA to the four operators in California. Let’s select CISO: (PGAE) Pacific Gas and Electric and save the selection:



Once finalize the routes,facets, and any other filters, you can submit the query and you should receive the API metadata. For the selection above, here is the expected respond:



Let’s look at the returned metadata above and use it to set the query to pull the data, starting with the API URL:

https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/data/?frequency=hourly&data[0]=value&facets[parent][]=CISO&facets[subba][]=PGAE&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

The URL contains the following components:

  • The API endpoint - https://api.eia.gov/v2/
  • The data path - rto/region-sub-ba-data/data/
  • The query - ?frequency=hourly&data[0]=value&facets[parent][]=CISO&facets[subba][]=PGAE&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

We will use the data path to set the api_path. To set the query parameters, you can use the embedded query in the URL by extracting the values of the different filters (frequency, facet, etc.) or use the query header on the right side:


X-Params: {
"frequency": "hourly",
"data": [
"value"
],
"facets": {
"parent": [
"CISO"
],
"subba": [
"PGAE"
]
},
"start": null,
"end": null,
"sort": [
{
"column": "period",
"direction": "desc"
}
],
"offset": 0,
"length": 5000
}

We can now set the query by defining the frequency, data, and facets arguments as defined in the header above:

library(EIAapi)

df1 <- eia_get(api_key = Sys.getenv("eia_key"),
               api_path = "electricity/rto/region-sub-ba-data/data/",
               frequency = "hourly",
               data = "value",
               facets = list(parent = "CISO",
                             subba = "PGAE"),
               offset = 0,
               length = 5000)

head(df1)
#>          period subba               subba-name parent
#> 1 2019-09-02T07  PGAE Pacific Gas and Electric   CISO
#> 2 2019-09-01T11  PGAE Pacific Gas and Electric   CISO
#> 3 2019-09-01T10  PGAE Pacific Gas and Electric   CISO
#> 4 2019-09-02T04  PGAE Pacific Gas and Electric   CISO
#> 5 2019-09-01T21  PGAE Pacific Gas and Electric   CISO
#> 6 2019-09-02T05  PGAE Pacific Gas and Electric   CISO
#>                              parent-name value   value-units
#> 1 California Independent System Operator 12431 megawatthours
#> 2 California Independent System Operator 10270 megawatthours
#> 3 California Independent System Operator 10620 megawatthours
#> 4 California Independent System Operator 15685 megawatthours
#> 5 California Independent System Operator 12945 megawatthours
#> 6 California Independent System Operator 14750 megawatthours

unique(df1$parent)
#> [1] "CISO"
unique(df1$subba)
#> [1] "PGAE"

Note: The API limit the number of observations per a call to 5000. If you wish to pull more than 5000 observations you can iterate your call and set the start and end arguments as a rolling window function. Similarly, you can use the length and offset arguments to define the rolling function.

API metadata

The EIA API provides detailed information and metadata about the categories and time series available in the API. This enables extracting information programmatically about different data parameters. The eia_metadata enables query metadata from the API. The function has two arguments:

  • api_path - the API category/route path following the API endpoint (i.e., https://api.eia.gov/v2/)
  • api_key - the API key

Category metadata

Querying metadata is done by sending a request with a specific route path, and the API, in response, returns the available sub-categories under that path. For example, setting the api_path argument empty or NULL returns a list with the API main categories:

main_route <- eia_metadata(api_path = "",
                           api_key = Sys.getenv("eia_key"))

main_route$routes[, c("id", "name")]
#>                   id                            name
#> 1               coal                            Coal
#> 2  crude-oil-imports               Crude Oil Imports
#> 3        electricity                     Electricity
#> 4      international                   International
#> 5        natural-gas                     Natural Gas
#> 6    nuclear-outages                 Nuclear Outages
#> 7          petroleum                       Petroleum
#> 8               seds State Energy Data System (SEDS)
#> 9               steo       Short Term Energy Outlook
#> 10 densified-biomass               Densified Biomass
#> 11      total-energy                    Total Energy
#> 12               aeo           Annual Energy Outlook
#> 13               ieo    International Energy Outlook
#> 14     co2-emissions             State CO2 Emissions

Similarly, we can continue and extract the sub-categories under the electricity category by setting the api_path argument to electricity:

main_route <- eia_metadata(api_path = "electricity",
                           api_key = Sys.getenv("eia_key"))

main_route$routes[, c("id", "name")]
#>                                id
#> 1                    retail-sales
#> 2 electric-power-operational-data
#> 3                             rto
#> 4      state-electricity-profiles
#> 5    operating-generator-capacity
#> 6                   facility-fuel
#>                                                                         name
#> 1                                    Electricity Sales to Ultimate Customers
#> 2                             Electric Power Operations (Annual and Monthly)
#> 3                               Electric Power Operations (Daily and Hourly)
#> 4                                                        State Specific Data
#> 5                                           Inventory of Operable Generators
#> 6 Electric Power Operations for Individual Power Plants (Annual and Monthly)

Note: the number of sub-categories or routes varies between the different categories.

Series metadata

The last route in the API path defines the series available under those categories. For example, we used the above to pull the hourly demand for electricity by sub-region the following URL:

https://api.eia.gov/v2/electricity/rto/region-sub-ba-data/data/?frequency=hourly&data[0]=value&facets[parent][]=CISO&facets[subba][]=PGAE&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000

In this case, the path is defined as - electricity/rto/region-sub-ba-data/, where:

  • electricity is the main category
  • rto - is the sub-category under electricity representing the Electric Power Operations (Daily and Hourly), and
  • region-sub-ba-data - is the last sub-category in this route, representing the Hourly Demand by Sub-region

You can notice that the last sub-category in this path - region-sub-ba-data ended with data to indicate to the user that this is the last route, and under it there is data. The metadata for the time series has a different structure with respect to the category metadata. It provides useful information such as: - Starting and ending time of the series - Available facets - Series Frequency

For example, let’s query for Demand by Sub-region metadata:

elec_sub_route <- eia_metadata(api_path = "electricity/rto/region-sub-ba-data/",
                               api_key = Sys.getenv("eia_key"))

elec_sub_route 
#> $id
#> [1] "region-sub-ba-data"
#> 
#> $name
#> [1] "Hourly Demand by Subregion"
#> 
#> $description
#> [1] "Hourly demand by balancing authority subregion.  \n    Source: Form EIA-930\n    Product: Hourly Electric Grid Monitor"
#> 
#> $frequency
#>             id                    alias
#> 1       hourly             hourly (UTC)
#> 2 local-hourly hourly (Local Time Zone)
#>                                   description query               format
#> 1   One data point for each hour in UTC time.     H    YYYY-MM-DD"T"HH24
#> 2 One data point for each hour in local time.    LH YYYY-MM-DD"T"HH24TZH
#> 
#> $facets
#>       id         description
#> 1  subba           Subregion
#> 2 parent Balancing Authority
#> 
#> $data
#> $data$value
#> $data$value$`aggregation-method`
#> [1] "SUM"
#> 
#> $data$value$alias
#> [1] "Demand"
#> 
#> $data$value$units
#> [1] "megawatthours"
#> 
#> 
#> 
#> $startPeriod
#> [1] "2018-06-19T05"
#> 
#> $endPeriod
#> [1] "2023-08-11T07"
#> 
#> $defaultDateFormat
#> [1] "YYYY-MM-DD\"T\"HH24"
#> 
#> $defaultFrequency
#> [1] "hourly"
#> 
#> $command
#> [1] "/v2/electricity/rto/region-sub-ba-data/"
#> 
#> attr(,"class")
#> [1] "list"         "eia_metadata"

You can continue and query additional information on this series. For example, let’s extract the list of balancing authorities that are available for this series under the parent facet:

 eia_metadata(api_path = "electricity/rto/region-sub-ba-data/facet/parent",
                               api_key = Sys.getenv("eia_key"))
#> $totalFacets
#> [1] 8
#> 
#> $facets
#>     id                                           name
#> 1 ISNE                                ISO New England
#> 2 NYIS           New York Independent System Operator
#> 3 SWPP                           Southwest Power Pool
#> 4 ERCO    Electric Reliability Council of Texas, Inc.
#> 5 MISO Midcontinent Independent System Operator, Inc.
#> 6 CISO         California Independent System Operator
#> 7  PNM           Public Service Company of New Mexico
#> 8  PJM                       PJM Interconnection, LLC
#>                                                   alias
#> 1                                (ISNE) ISO New England
#> 2           (NYIS) New York Independent System Operator
#> 3                           (SWPP) Southwest Power Pool
#> 4    (ERCO) Electric Reliability Council of Texas, Inc.
#> 5 (MISO) Midcontinent Independent System Operator, Inc.
#> 6         (CISO) California Independent System Operator
#> 7            (PNM) Public Service Company of New Mexico
#> 8                        (PJM) PJM Interconnection, LLC
#> 
#> $command
#> [1] "/v2/electricity/rto/region-sub-ba-data/facet/parent/"
#> 
#> attr(,"class")
#> [1] "list"         "eia_metadata"