The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.
More details available here, and a csv format of the package dataset available here

As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes
Install the CRAN version:
install.packages("coronavirus")Install the Github version (refreshed on a daily bases):
# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")While the coronavirus CRAN version is updated every month or two, the Github (Dev) version is updated on a daily bases. The update_dataset function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:
library(coronavirus)
update_dataset()Note: must restart the R session to have the updates available
Alternatively, you can pull the data using the Covid19R project data standard format with the refresh_coronavirus_jhu function:
covid19_df <- refresh_coronavirus_jhu()
head(covid19_df)
#> date location location_type location_code location_code_type
#> 1 2020-06-03 Afghanistan country AF iso_3166_2
#> 2 2020-06-07 Afghanistan country AF iso_3166_2
#> 3 2020-06-02 Afghanistan country AF iso_3166_2
#> 4 2020-06-04 Afghanistan country AF iso_3166_2
#> 5 2020-06-08 Afghanistan country AF iso_3166_2
#> 6 2020-06-06 Afghanistan country AF iso_3166_2
#> data_type value lat long
#> 1 cases_new 758 33.93911 67.70995
#> 2 recovered_new 45 33.93911 67.70995
#> 3 cases_new 759 33.93911 67.70995
#> 4 cases_new 787 33.93911 67.70995
#> 5 recovered_new 296 33.93911 67.70995
#> 6 recovered_new 68 33.93911 67.70995data("coronavirus")This coronavirus dataset has the following fields:
date - The date of the summaryprovince - The province or state, when applicablecountry - The country or region namelat - Latitude pointlong - Longitude pointtype - the type of case (i.e., confirmed, death)cases - the number of daily cases (corresponding to the case type)head(coronavirus)
#> date province country lat long type cases
#> 1 2020-01-22 Afghanistan 33.93911 67.70995 confirmed 0
#> 2 2020-01-23 Afghanistan 33.93911 67.70995 confirmed 0
#> 3 2020-01-24 Afghanistan 33.93911 67.70995 confirmed 0
#> 4 2020-01-25 Afghanistan 33.93911 67.70995 confirmed 0
#> 5 2020-01-26 Afghanistan 33.93911 67.70995 confirmed 0
#> 6 2020-01-27 Afghanistan 33.93911 67.70995 confirmed 0Summary of the total confrimed cases by country (top 20):
library(dplyr)
summary_df <- coronavirus %>%
filter(type == "confirmed") %>%
group_by(country) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases)
summary_df %>% head(20)
#> # A tibble: 20 x 2
#> country total_cases
#> <chr> <int>
#> 1 US 5313055
#> 2 Brazil 3226443
#> 3 India 2525922
#> 4 Russia 910778
#> 5 South Africa 579140
#> 6 Peru 516296
#> 7 Mexico 511369
#> 8 Colombia 445111
#> 9 Chile 382111
#> 10 Spain 342813
#> 11 Iran 338825
#> 12 United Kingdom 315621
#> 13 Saudi Arabia 295902
#> 14 Pakistan 287300
#> 15 Argentina 282437
#> 16 Bangladesh 271881
#> 17 Italy 252809
#> 18 France 249655
#> 19 Turkey 246861
#> 20 Germany 223791Summary of new cases during the past 24 hours by country and type (as of 2020-08-14):
library(tidyr)
coronavirus %>%
filter(date == max(date)) %>%
select(country, type, cases) %>%
group_by(country, type) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type,
values_from = total_cases) %>%
arrange(-confirmed)
#> # A tibble: 188 x 4
#> # Groups: country [188]
#> country confirmed death recovered
#> <chr> <int> <int> <int>
#> 1 India 64732 996 57381
#> 2 US 64201 1336 21678
#> 3 Peru 17741 4143 12294
#> 4 Colombia 11306 347 10799
#> 5 Argentina 6365 165 6571
#> 6 South Africa 6275 286 24117
#> 7 Philippines 6134 16 1018
#> 8 Mexico 5618 615 3896
#> 9 France 5559 18 381
#> 10 Spain 5479 12 0
#> # … with 178 more rowsPlotting the total cases by type worldwide:
library(plotly)
coronavirus %>%
group_by(type, date) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type, values_from = total_cases) %>%
arrange(date) %>%
mutate(active = confirmed - death - recovered) %>%
mutate(active_total = cumsum(active),
recovered_total = cumsum(recovered),
death_total = cumsum(death)) %>%
plot_ly(x = ~ date,
y = ~ active_total,
name = 'Active',
fillcolor = '#1f77b4',
type = 'scatter',
mode = 'none',
stackgroup = 'one') %>%
add_trace(y = ~ death_total,
name = "Death",
fillcolor = '#E41317') %>%
add_trace(y = ~recovered_total,
name = 'Recovered',
fillcolor = 'forestgreen') %>%
layout(title = "Distribution of Covid19 Cases Worldwide",
legend = list(x = 0.1, y = 0.9),
yaxis = list(title = "Number of Cases"),
xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))
Plot the confirmed cases distribution by counrty with treemap plot:
conf_df <- coronavirus %>%
filter(type == "confirmed") %>%
group_by(country) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases) %>%
mutate(parents = "Confirmed") %>%
ungroup()
plot_ly(data = conf_df,
type= "treemap",
values = ~total_cases,
labels= ~ country,
parents= ~parents,
domain = list(column=0),
name = "Confirmed",
textinfo="label+value+percent parent")
The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources: