vignettes/update_dataset_function.Rmd
update_dataset_function.Rmd
As of March 10th, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. Therefore, the package’s datasets are no longer get update beyond that date. Therefore, the update_dataset
function is retired
While the CRAN version of the package is updated once every month or two, the Github (Dev) version is updating on a daily bases. The following options allow you to keep the data updated with the ones available on the Dev version:
update_dataset
functionThe update_dataset
function enables to keep the installed version updated with the data available on Github. The function compared between the dataset on the installed version and the ones on the Dev version:
If no new data is available on the Dev version, the function will return the following message:
No updates are available
Once new data is available, the function will prompt the following question that enables the user to select whether to install the updates from the dev version:
/Y Updates are available on the coronavirus Dev version, do you want to update? n
In order to make the new data available, you will have to restart your R session.
Note: As frequent changes may occur on the raw data structure (such as new fields, retroactive updates in the data, etc.), the Dev version dataset may change accordingly.
Alternatively, you can read and load the data directly from the package repository, using the csv version:
library(readr)
coronavirus_df <- read_csv("https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv",
col_types = cols(date = col_date(format = "%Y-%m-%d"),
cases = col_number()))
head(coronavirus_df)
## # A tibble: 6 × 15
## date province country lat long type cases uid iso2 iso3 code3
## <date> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 2020-01-22 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## 2 2020-01-23 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## 3 2020-01-24 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## 4 2020-01-25 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## 5 2020-01-26 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## 6 2020-01-27 Alberta Canada 53.9 -117. confirm… 0 12401 CA CAN 124
## # … with 4 more variables: combined_key <chr>, population <dbl>,
## # continent_name <chr>, continent_code <chr>
The main difference between the first method (the update_dataset
function) and the second method (reading a CSV format of the data) is that the date field on the last method is not formated as a Date
object. A quick reformating can fix it:
str(coronavirus_df)
## spec_tbl_df [919,308 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ date : Date[1:919308], format: "2020-01-22" "2020-01-23" ...
## $ province : chr [1:919308] "Alberta" "Alberta" "Alberta" "Alberta" ...
## $ country : chr [1:919308] "Canada" "Canada" "Canada" "Canada" ...
## $ lat : num [1:919308] 53.9 53.9 53.9 53.9 53.9 ...
## $ long : num [1:919308] -117 -117 -117 -117 -117 ...
## $ type : chr [1:919308] "confirmed" "confirmed" "confirmed" "confirmed" ...
## $ cases : num [1:919308] 0 0 0 0 0 0 0 0 0 0 ...
## $ uid : num [1:919308] 12401 12401 12401 12401 12401 ...
## $ iso2 : chr [1:919308] "CA" "CA" "CA" "CA" ...
## $ iso3 : chr [1:919308] "CAN" "CAN" "CAN" "CAN" ...
## $ code3 : num [1:919308] 124 124 124 124 124 124 124 124 124 124 ...
## $ combined_key : chr [1:919308] "Alberta, Canada" "Alberta, Canada" "Alberta, Canada" "Alberta, Canada" ...
## $ population : num [1:919308] 4413146 4413146 4413146 4413146 4413146 ...
## $ continent_name: chr [1:919308] "North America" "North America" "North America" "North America" ...
## $ continent_code: chr [1:919308] NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. date = col_date(format = "%Y-%m-%d"),
## .. province = col_character(),
## .. country = col_character(),
## .. lat = col_double(),
## .. long = col_double(),
## .. type = col_character(),
## .. cases = col_number(),
## .. uid = col_double(),
## .. iso2 = col_character(),
## .. iso3 = col_character(),
## .. code3 = col_double(),
## .. combined_key = col_character(),
## .. population = col_double(),
## .. continent_name = col_character(),
## .. continent_code = col_character()
## .. )
## - attr(*, "problems")=<externalptr>