The covid19_vaccine dataset provides time-series data on the vaccination progress by country or province (if applicable). Likewise the coronavirus dataset, the COVID19 vaccine raw data is collected by Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE). The covid19_vaccine data includes the following fields:

  • country_region - Country or region name
  • date - Data collection date in YYYY-MM-DD format
  • doses_admin - Cumulative number of doses administered. When a vaccine requires multiple doses, each one is counted independently
  • people_partially_vaccinated - Cumulative number of people who received at least one vaccine dose. When the person receives a prescribed second dose, it is not counted twice
  • people_fully_vaccinated - Cumulative number of people who received all prescribed doses necessary to be considered fully vaccinated
  • report_date_string - Data report date in YYYY-MM-DD format
  • uid - Country code
  • province_state - Province or state if applicable
  • iso2 - Officially assigned country code identifiers with two-letter
  • iso3 - Officially assigned country code identifiers with three-letter
  • code3 - UN country code
  • fips - Federal Information Processing Standards code that uniquely identifies counties within the USA
  • lat - Latitude
  • long - Longitude
  • combined_key - Country and province (if applicable)
  • population - Country or province population
  • continent_name - Continent name
  • continent_code - Continent code

Data sources:

Note: The country / province code fields (e.g., ios2, ios3, etc.) and population were merged with the raw data

library(coronavirus)

data("covid19_vaccine")

head(covid19_vaccine)
#>   country_region       date doses_admin people_partially_vaccinated
#> 1         Canada 2020-12-14           5                           0
#> 2          World 2020-12-14           5                           0
#> 3         Canada 2020-12-15         723                           0
#> 4          China 2020-12-15     1500000                           0
#> 5         Russia 2020-12-15       28500                       28500
#> 6          World 2020-12-15     1529223                       28500
#>   people_fully_vaccinated report_date_string uid province_state iso2 iso3 code3
#> 1                       0         2020-12-14 124           <NA>   CA  CAN   124
#> 2                       0         2020-12-14  NA           <NA> <NA> <NA>    NA
#> 3                       0         2020-12-15 124           <NA>   CA  CAN   124
#> 4                       0         2020-12-15 156           <NA>   CN  CHN   156
#> 5                       0         2020-12-15 643           <NA>   RU  RUS   643
#> 6                       0         2020-12-15  NA           <NA> <NA> <NA>    NA
#>   fips      lat     long combined_key population continent_name continent_code
#> 1 <NA> 60.00000 -95.0000       Canada   37855702  North America             NA
#> 2 <NA>       NA       NA         <NA>         NA           <NA>           <NA>
#> 3 <NA> 60.00000 -95.0000       Canada   37855702  North America             NA
#> 4 <NA> 35.86170 104.1954        China 1404676330           Asia             AS
#> 5 <NA> 61.52401 105.3188       Russia  145934460         Europe             EU
#> 6 <NA>       NA       NA         <NA>         NA           <NA>           <NA>

Proportion of vaccinated population

We can measure the proportion of the vaccinated group out of the total population using the people_fully_vaccinated and population fields. We will start by filtering the data by the latest date and removing observations without population data (province-level):

library(dplyr)

df <- covid19_vaccine %>% 
  filter(date == max(date),
         !is.na(population)) 

Next, we will calculate the ratio of the fully vaccinated people out of the total population:

df <- df %>% 
  mutate(fully_vaccinated_ratio = people_fully_vaccinated / population) %>%
  arrange(- fully_vaccinated_ratio)

head(df, 10)
#>          country_region       date doses_admin people_partially_vaccinated
#> 1                 Malta 2021-12-30     1060424                      443372
#> 2  United Arab Emirates 2021-12-30    22609701                     9890348
#> 3              Portugal 2021-12-30    19229373                     9180041
#> 4                Brunei 2021-12-30      823787                      405092
#> 5                 Chile 2021-12-30    43938914                    17271906
#> 6                 China 2021-12-30  2795716000                  1259967000
#> 7                  Cuba 2021-12-30    30090996                    10429625
#> 8               Iceland 2021-12-30      716926                      288241
#> 9              Cambodia 2021-12-30    30336200                    14256537
#> 10            Singapore 2021-12-30     9543800                     4799168
#>    people_fully_vaccinated report_date_string uid province_state iso2 iso3
#> 1                   435635         2021-12-31 470           <NA>   MT  MLT
#> 2                  9059559         2021-12-31 784           <NA>   AE  ARE
#> 3                  9093923         2021-12-31 620           <NA>   PT  PRT
#> 4                   385414         2021-12-31  96           <NA>   BN  BRN
#> 5                 16512621         2021-12-31 152           <NA>   CL  CHL
#> 6               1207413000         2021-12-31 156           <NA>   CN  CHN
#> 7                  9643835         2021-12-31 192           <NA>   CU  CUB
#> 8                   283920         2021-12-31 352           <NA>   IS  ISL
#> 9                 13650819         2021-12-31 116           <NA>   KH  KHM
#> 10                 4744632         2021-12-31 702           <NA>   SG  SGP
#>    code3 fips       lat      long         combined_key population
#> 1    470 <NA>  35.93750  14.37540                Malta     441539
#> 2    784 <NA>  23.42408  53.84782 United Arab Emirates    9890400
#> 3    620 <NA>  39.39990  -8.22450             Portugal   10196707
#> 4     96 <NA>   4.53530 114.72770               Brunei     437483
#> 5    152 <NA> -35.67510 -71.54300                Chile   19116209
#> 6    156 <NA>  35.86170 104.19545                China 1404676330
#> 7    192 <NA>  21.52176 -77.78117                 Cuba   11326616
#> 8    352 <NA>  64.96310 -19.02080              Iceland     341250
#> 9    116 <NA>  11.55000 104.91670             Cambodia   16718971
#> 10   702 <NA>   1.28330 103.83330            Singapore    5850343
#>    continent_name continent_code fully_vaccinated_ratio
#> 1          Europe             EU              0.9866286
#> 2            Asia             AS              0.9159952
#> 3          Europe             EU              0.8918490
#> 4            Asia             AS              0.8809805
#> 5   South America             SA              0.8638021
#> 6            Asia             AS              0.8595667
#> 7   North America             NA              0.8514313
#> 8          Europe             EU              0.8320000
#> 9            Asia             AS              0.8164868
#> 10           Asia             AS              0.8110007

We can plot the top 20 vaccinated countries:

library(plotly)

top_20 <- df %>% 
  slice_head(n = 20) %>%
  arrange(fully_vaccinated_ratio) %>%
  mutate(country = factor(country_region, levels = country_region))
  
plot_ly(data = top_20,
        y = ~ country,
        x = ~ round(100 * fully_vaccinated_ratio, 2),
        text = ~ paste(round(100 * fully_vaccinated_ratio, 1), "%"),
        textposition = 'auto',
        orientation = "h",
        type = "bar") %>%
  layout(title = "Proportion of Fully Vaccineted Population - Top 20 Countries",
         yaxis = list(title = ""),
         xaxis = list(title = "Percentage", ticksuffix = "%"))

Continent View

Similarly, we can filter and plot the percentage of vaccinated people by continent, using the continent_name field:


continent_df <- df %>%
  filter(!is.na(continent_name),
         !is.na(fully_vaccinated_ratio))

table(continent_df$continent_name)
#> 
#>        Africa          Asia        Europe North America       Oceania 
#>            49            43            44            23             8 
#> South America 
#>            12

p <- list()

for(i in unique(continent_df$continent_name)){
  d <- NULL
  
  d <- continent_df %>% 
    filter(continent_name == i) %>% 
    arrange(fully_vaccinated_ratio) %>%
    mutate(country = factor(country_region, levels = country_region))
  
  p[[i]] <-  plot_ly(data = d,
          y = ~ country,
          x = ~ round(100 * fully_vaccinated_ratio, 2),
          orientation = "h",
          showlegend = FALSE,
          name = i,
          type = "bar") %>%
    layout(title = "Percentage of Fully Vaccineted Population by Continent and Country",
           yaxis = list(title = ""),
           xaxis = list(title = "", ticksuffix = "%")) %>%
    add_annotations(text = i,
                    xref = "paper",
                    yref = "paper",
                    x = 0.5,
                    y = 0.1,
                    align = "right",
                    showarrow = FALSE)
  
  
}


subplot(p, nrows = 3, shareX = TRUE, margin = 0.06)

Comparing between new cases and vaccination

We can compare the changes in the daily new cases and the cumulative number of fully vaccinated on the country level by merging the coronavirus and covid19_vaccine datasets.

Note: At this point, the coronavirus dataset does not have the ios country codes. Therefore, a merge between the two datasets may require some manual effort for adjusting the country codes.

In the following example, we will plot the daily number of cases and the total number of fully vaccinated people in the US. We will filter the datasets by country and merge them by date:


data("coronavirus")


us_cases <- coronavirus %>% 
  filter(country == "US", 
         type == "confirmed") %>%
  arrange(date) %>%
  select(date, cases) %>%
  left_join(
    covid19_vaccine %>% 
      filter(country_region == "US") %>%
      select(date, people_fully_vaccinated),
    by = "date")

 tail(us_cases)
#>           date  cases people_fully_vaccinated
#> 704 2021-12-25  56953               204740321
#> 705 2021-12-26 181948               204740321
#> 706 2021-12-27 512553               205196973
#> 707 2021-12-28 377014               205420745
#> 708 2021-12-29 489267               205638307
#> 709 2021-12-30 647067               205811394

As you can see in the table above, the daily number of cases and the aggregate number of fully vaccinated people are not on the same scale. Therefore, we will normalize the two series between 0 and 1:


us_cases <- us_cases %>%
  mutate(cases_normalized = (cases - min(cases)) / (max(cases) - min(cases)),
         people_fully_vaccinated_normilized = (people_fully_vaccinated - min(people_fully_vaccinated, na.rm = TRUE)) / 
           (max(people_fully_vaccinated, na.rm = TRUE) - 
              min(people_fully_vaccinated, na.rm = TRUE)))

tail(us_cases)
#>           date  cases people_fully_vaccinated cases_normalized
#> 704 2021-12-25  56953               204740321       0.08801716
#> 705 2021-12-26 181948               204740321       0.28118881
#> 706 2021-12-27 512553               205196973       0.79211735
#> 707 2021-12-28 377014               205420745       0.58265064
#> 708 2021-12-29 489267               205638307       0.75613035
#> 709 2021-12-30 647067               205811394       1.00000000
#>     people_fully_vaccinated_normilized
#> 704                          0.9947959
#> 705                          0.9947959
#> 706                          0.9970146
#> 707                          0.9981019
#> 708                          0.9991590
#> 709                          1.0000000

Let’s plot the two normalized series together:

plot_ly(data = us_cases,
        x = ~ date,
        y = ~ cases_normalized,
        type = "scatter",
        mode = "line",
        name = "Daily Cases (Normalized)") %>%
  add_lines(x = ~ date,
            y = ~ people_fully_vaccinated_normilized,
            name = "Fully Vaccinated - Aggregate (Normalized)") %>%
  layout(title = "US - Daily New Cases vs. Total Vaccinated Population (Normalized)",
         legend = list(orientation = 'h'),
         yaxis = list(title = "Normalized Daily Cases/Total Vaccinated"),
         xaxis = list(title = ""),
         margin = list(b = 60))