The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.

More details available here, and a csv format of the package dataset available here

Source: Centers for Disease Control and Prevention’s Public Health Image Library

Important Note

As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")

Data refresh

While the coronavirus CRAN version is updated every month or two, the Github (Dev) version is updated on a daily bases. The update_dataset function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:

Note: must restart the R session to have the updates available

Alternatively, you can pull the data using the Covid19R project data standard format with the refresh_coronavirus_jhu function:

covid19_df <- refresh_coronavirus_jhu()
head(covid19_df)
#>         date        location location_type location_code location_code_type  data_type value     lat      long
#> 1 2020-10-10 Alberta, Canada         state         CA-AB         iso_3166_2 deaths_new     0 53.9333 -116.5765
#> 2 2020-08-09 Alberta, Canada         state         CA-AB         iso_3166_2  cases_new     0 53.9333 -116.5765
#> 3 2020-10-21 Alberta, Canada         state         CA-AB         iso_3166_2  cases_new   406 53.9333 -116.5765
#> 4 2020-06-15 Alberta, Canada         state         CA-AB         iso_3166_2 deaths_new     1 53.9333 -116.5765
#> 5 2020-06-13 Alberta, Canada         state         CA-AB         iso_3166_2  cases_new    37 53.9333 -116.5765
#> 6 2020-06-21 Alberta, Canada         state         CA-AB         iso_3166_2  cases_new    31 53.9333 -116.5765

Dashboard

A supporting dashboard is available here

Usage

data("coronavirus")

This coronavirus dataset has the following fields:

  • date - The date of the summary
  • province - The province or state, when applicable
  • country - The country or region name
  • lat - Latitude point
  • long - Longitude point
  • type - the type of case (i.e., confirmed, death)
  • cases - the number of daily cases (corresponding to the case type)
head(coronavirus)
#>         date province             country       lat       long      type cases
#> 1 2020-01-22     <NA>         Afghanistan  33.93911  67.709953 confirmed     0
#> 2 2020-01-22     <NA>             Albania  41.15330  20.168300 confirmed     0
#> 3 2020-01-22     <NA>             Algeria  28.03390   1.659600 confirmed     0
#> 4 2020-01-22     <NA>             Andorra  42.50630   1.521800 confirmed     0
#> 5 2020-01-22     <NA>              Angola -11.20270  17.873900 confirmed     0
#> 6 2020-01-22     <NA> Antigua and Barbuda  17.06080 -61.796400 confirmed     0

Summary of the total confrimed cases by country (top 20):

library(dplyr)

summary_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20) 
#> # A tibble: 20 x 2
#>    country        total_cases
#>    <chr>                <dbl>
#>  1 US                24821813
#>  2 India             10639684
#>  3 Brazil             8753920
#>  4 Russia             3637862
#>  5 United Kingdom     3594094
#>  6 France             3069695
#>  7 Spain              2499560
#>  8 Italy              2441854
#>  9 Turkey             2418472
#> 10 Germany            2125261
#> 11 Colombia           1987418
#> 12 Argentina          1853830
#> 13 Mexico             1732290
#> 14 Poland             1464448
#> 15 South Africa       1392568
#> 16 Iran               1360852
#> 17 Ukraine            1222459
#> 18 Peru               1082907
#> 19 Indonesia           965283
#> 20 Netherlands         951747

Summary of new cases during the past 24 hours by country and type (as of 2021-01-22):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)
#> # A tibble: 192 x 4
#> # Groups:   country [192]
#>    country              confirmed death recovered
#>    <chr>                    <dbl> <dbl>     <dbl>
#>  1 US                      189925  3758         0
#>  2 Brazil                   56552  1096     73818
#>  3 Spain                    42885   400         0
#>  4 United Kingdom           40321  1401        97
#>  5 France                   23324   649      1287
#>  6 Russia                   21182   566     26976
#>  7 Mexico                   21007  1440     27160
#>  8 Germany                  16366   837     16246
#>  9 Colombia                 15073   399     10418
#> 10 India                    14256   152     17136
#> 11 Portugal                 13987   234      7319
#> 12 Italy                    13633   472     27676
#> 13 Indonesia                13632   250      8357
#> 14 South Africa             11761   575     17841
#> 15 Argentina                10753   220     11071
#> 16 Peru                      9693   230         0
#> 17 Czechia                   7488   157     13320
#> 18 Poland                    6693   347         0
#> 19 Iran                      6332    75      7127
#> 20 Israel                    6159    21      7243
#> 21 Turkey                    5967   149      6018
#> 22 Canada                    5827   123      6939
#> 23 Netherlands               5799    89       103
#> 24 Ukraine                   5679   177     14581
#> 25 Japan                     5045   108      6027
#> 26 Chile                     4959    84      3031
#> 27 Sweden                    4214    84         0
#> 28 Malaysia                  3631    18      2554
#> 29 United Arab Emirates      3552    10      3945
#> 30 Lebanon                   3220    57      3185
#> 31 Romania                   2699    74      4635
#> 32 Belgium                   2444    55         0
#> 33 Tunisia                   2389   103      2720
#> 34 Ireland                   2357    52         0
#> 35 Philippines               2170    20       245
#> 36 Switzerland               2156    63         0
#> 37 Austria                   2088    42      2048
#> 38 Panama                    2041    36      3288
#> 39 Pakistan                  1927    43      1737
#> 40 Bolivia                   1864    53      1006
#> # … with 152 more rows

Plotting the total cases by type worldwide:

library(plotly)

coronavirus %>% 
  group_by(type, date) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(date) %>%
  mutate(active = confirmed - death - recovered) %>%
  mutate(active_total = cumsum(active),
                recovered_total = cumsum(recovered),
                death_total = cumsum(death)) %>%
  plot_ly(x = ~ date,
                  y = ~ active_total,
                  name = 'Active', 
                  fillcolor = '#1f77b4',
                  type = 'scatter',
                  mode = 'none', 
                  stackgroup = 'one') %>%
  add_trace(y = ~ death_total, 
             name = "Death",
             fillcolor = '#E41317') %>%
  add_trace(y = ~recovered_total, 
            name = 'Recovered', 
            fillcolor = 'forestgreen') %>%
  layout(title = "Distribution of Covid19 Cases Worldwide",
         legend = list(x = 0.1, y = 0.9),
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

Plot the confirmed cases distribution by counrty with treemap plot:

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 
  
  plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources: