The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

More details available here, and a csv format of the package dataset available here

Source: Centers for Disease Control and Prevention’s Public Health Image Library

Important Note

As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")

Data refresh

While the coronavirus CRAN version is updated every month or two, the Github (Dev) version is updated on a daily bases. The update_dataset function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:

library(coronavirus)
update_dataset()

Note: must restart the R session to have the updates available

Alternatively, you can pull the data using the Covid19R project data standard format with the refresh_coronavirus_jhu function:

covid19_df <- refresh_coronavirus_jhu()

head(covid19_df)
#>         date    location location_type location_code location_code_type
#> 1 2020-06-03 Afghanistan       country            AF         iso_3166_2
#> 2 2020-06-07 Afghanistan       country            AF         iso_3166_2
#> 3 2020-06-02 Afghanistan       country            AF         iso_3166_2
#> 4 2020-06-04 Afghanistan       country            AF         iso_3166_2
#> 5 2020-06-08 Afghanistan       country            AF         iso_3166_2
#> 6 2020-06-06 Afghanistan       country            AF         iso_3166_2
#>       data_type value      lat     long
#> 1     cases_new   758 33.93911 67.70995
#> 2 recovered_new    45 33.93911 67.70995
#> 3     cases_new   759 33.93911 67.70995
#> 4     cases_new   787 33.93911 67.70995
#> 5 recovered_new   296 33.93911 67.70995
#> 6 recovered_new    68 33.93911 67.70995

Dashboard

A supporting dashboard is available here

Usage

data("coronavirus")

This coronavirus dataset has the following fields:

  • date - The date of the summary
  • province - The province or state, when applicable
  • country - The country or region name
  • lat - Latitude point
  • long - Longitude point
  • type - the type of case (i.e., confirmed, death)
  • cases - the number of daily cases (corresponding to the case type)
head(coronavirus)
#>         date province     country      lat     long      type cases
#> 1 2020-01-22          Afghanistan 33.93911 67.70995 confirmed     0
#> 2 2020-01-23          Afghanistan 33.93911 67.70995 confirmed     0
#> 3 2020-01-24          Afghanistan 33.93911 67.70995 confirmed     0
#> 4 2020-01-25          Afghanistan 33.93911 67.70995 confirmed     0
#> 5 2020-01-26          Afghanistan 33.93911 67.70995 confirmed     0
#> 6 2020-01-27          Afghanistan 33.93911 67.70995 confirmed     0

Summary of the total confrimed cases by country (top 20):

library(dplyr)

summary_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20) 
#> # A tibble: 20 x 2
#>    country        total_cases
#>    <chr>                <int>
#>  1 US                 5313055
#>  2 Brazil             3226443
#>  3 India              2525922
#>  4 Russia              910778
#>  5 South Africa        579140
#>  6 Peru                516296
#>  7 Mexico              511369
#>  8 Colombia            445111
#>  9 Chile               382111
#> 10 Spain               342813
#> 11 Iran                338825
#> 12 United Kingdom      315621
#> 13 Saudi Arabia        295902
#> 14 Pakistan            287300
#> 15 Argentina           282437
#> 16 Bangladesh          271881
#> 17 Italy               252809
#> 18 France              249655
#> 19 Turkey              246861
#> 20 Germany             223791

Summary of new cases during the past 24 hours by country and type (as of 2020-08-14):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
#> # A tibble: 188 x 4
#> # Groups:   country [188]
#>    country      confirmed death recovered
#>    <chr>            <int> <int>     <int>
#>  1 India            64732   996     57381
#>  2 US               64201  1336     21678
#>  3 Peru             17741  4143     12294
#>  4 Colombia         11306   347     10799
#>  5 Argentina         6365   165      6571
#>  6 South Africa      6275   286     24117
#>  7 Philippines       6134    16      1018
#>  8 Mexico            5618   615      3896
#>  9 France            5559    18       381
#> 10 Spain             5479    12         0
#> # … with 178 more rows

Plotting the total cases by type worldwide:

library(plotly)

coronavirus %>% 
  group_by(type, date) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(date) %>%
  mutate(active = confirmed - death - recovered) %>%
  mutate(active_total = cumsum(active),
                recovered_total = cumsum(recovered),
                death_total = cumsum(death)) %>%
  plot_ly(x = ~ date,
                  y = ~ active_total,
                  name = 'Active', 
                  fillcolor = '#1f77b4',
                  type = 'scatter',
                  mode = 'none', 
                  stackgroup = 'one') %>%
  add_trace(y = ~ death_total, 
             name = "Death",
             fillcolor = '#E41317') %>%
  add_trace(y = ~recovered_total, 
            name = 'Recovered', 
            fillcolor = 'forestgreen') %>%
  layout(title = "Distribution of Covid19 Cases Worldwide",
         legend = list(x = 0.1, y = 0.9),
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

Plot the confirmed cases distribution by counrty with treemap plot:

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 
  
  plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources: