Update the coronavirus Dataset • coronavirus

As of March 10th, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. Therefore, the package’s datasets are no longer get update beyond that date. Therefore, the update_dataset function is retired

While the CRAN version of the package is updated once every month or two, the Github (Dev) version is updating on a daily bases. The following options allow you to keep the data updated with the ones available on the Dev version:

Use the update_dataset function
Read directly from the package repo a csv format of the data

The update_dataset function

The update_dataset function enables to keep the installed version updated with the data available on Github. The function compared between the dataset on the installed version and the ones on the Dev version:

library(coronavirus)

update_dataset()

If no new data is available on the Dev version, the function will return the following message:

No updates are available

Once new data is available, the function will prompt the following question that enables the user to select whether to install the updates from the dev version:

Updates are available on the coronavirus Dev version, do you want to update? n/Y

In order to make the new data available, you will have to restart your R session.

Note: As frequent changes may occur on the raw data structure (such as new fields, retroactive updates in the data, etc.), the Dev version dataset may change accordingly.

Reading the data from CSV version

Alternatively, you can read and load the data directly from the package repository, using the csv version:

library(readr)
coronavirus_df <- read_csv("https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv", 
    col_types = cols(date = col_date(format = "%Y-%m-%d"), 
        cases = col_number()))

head(coronavirus_df)

## # A tibble: 6 × 15
##   date       province country   lat  long type     cases   uid iso2  iso3  code3
##   <date>     <chr>    <chr>   <dbl> <dbl> <chr>    <dbl> <dbl> <chr> <chr> <dbl>
## 1 2020-01-22 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## 2 2020-01-23 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## 3 2020-01-24 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## 4 2020-01-25 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## 5 2020-01-26 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## 6 2020-01-27 Alberta  Canada   53.9 -117. confirm…     0 12401 CA    CAN     124
## # … with 4 more variables: combined_key <chr>, population <dbl>,
## #   continent_name <chr>, continent_code <chr>

The main difference between the first method (the update_dataset function) and the second method (reading a CSV format of the data) is that the date field on the last method is not formated as a Date object. A quick reformating can fix it:

str(coronavirus_df)

## spec_tbl_df [919,308 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ date          : Date[1:919308], format: "2020-01-22" "2020-01-23" ...
##  $ province      : chr [1:919308] "Alberta" "Alberta" "Alberta" "Alberta" ...
##  $ country       : chr [1:919308] "Canada" "Canada" "Canada" "Canada" ...
##  $ lat           : num [1:919308] 53.9 53.9 53.9 53.9 53.9 ...
##  $ long          : num [1:919308] -117 -117 -117 -117 -117 ...
##  $ type          : chr [1:919308] "confirmed" "confirmed" "confirmed" "confirmed" ...
##  $ cases         : num [1:919308] 0 0 0 0 0 0 0 0 0 0 ...
##  $ uid           : num [1:919308] 12401 12401 12401 12401 12401 ...
##  $ iso2          : chr [1:919308] "CA" "CA" "CA" "CA" ...
##  $ iso3          : chr [1:919308] "CAN" "CAN" "CAN" "CAN" ...
##  $ code3         : num [1:919308] 124 124 124 124 124 124 124 124 124 124 ...
##  $ combined_key  : chr [1:919308] "Alberta, Canada" "Alberta, Canada" "Alberta, Canada" "Alberta, Canada" ...
##  $ population    : num [1:919308] 4413146 4413146 4413146 4413146 4413146 ...
##  $ continent_name: chr [1:919308] "North America" "North America" "North America" "North America" ...
##  $ continent_code: chr [1:919308] NA NA NA NA ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   date = col_date(format = "%Y-%m-%d"),
##   ..   province = col_character(),
##   ..   country = col_character(),
##   ..   lat = col_double(),
##   ..   long = col_double(),
##   ..   type = col_character(),
##   ..   cases = col_number(),
##   ..   uid = col_double(),
##   ..   iso2 = col_character(),
##   ..   iso3 = col_character(),
##   ..   code3 = col_double(),
##   ..   combined_key = col_character(),
##   ..   population = col_double(),
##   ..   continent_name = col_character(),
##   ..   continent_code = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>