The covid19sf_vaccine_demo and covid19sf_vaccine_demo_ts datasets provide demographics information on the San Francisco vaccination effort:

  • covid19sf_vaccine_demo - Summary of vaccine doses given to San Franciscans by demographics groups (age and race)
  • covid19sf_vaccine_demo_ts - Time series view of vaccine doses given to San Franciscans by demographics groups (age and race)

San Francisco vaccination summary

The covid19sf_vaccine_demo summarizes the COVID-19 vaccine doses given to San Franciscans by demographics (age and race groups) and includes the following fields:

  • overall_segment - Segment (universe) of analysis. Unique combination of administering_provider_type, age_group, and demographic_group. Filter to a single option to derive meaningful totals.
  • administering_provider_type - Providers included in a given overall_segment. Two possible values: ‘All’ (including SF DPH) or ‘DPH Only’
  • age_group - Age range included in a given overall_segment
  • demographic_group - Type of demographic group included in a given overall_segment (e.g. Age, Race/Ethnicity)
  • demographic_subgroup - Specific demographic group counted in a given record (e.g. 16-24, Asian)
  • demographic_subgroup_sort_order - Numeric sort order for all demographic_subgroups. Convenient for maintaining consistent ordering across multiple data visualizations.
  • total_1st_doses - Total number of first doses administered
  • total_2nd_doses - Total number of second doses administered
  • total_single_doses - Total number of single dose vaccines administered
  • total_recipients - Total number of unique vaccine recipients
  • total_series_completed - Total number of individuals fully vaccinated (those having received the second dose of a two-dose vaccine or one dose of a single-dose vaccine)
  • subgroup_population - 2018 5-year American Community Survey population estimates for given demographic_subgroup
  • age_group_population - 2018 5-year American Community Survey population estimates for overall age_group
  • data_as_of - Timestamp for last update date in source system
  • data_loaded_at - Timestamp when the record (row) was most recently updated in Socrata
library(covid19sf)

data("covid19sf_vaccine_demo")

head(covid19sf_vaccine_demo)
#>                                          overall_segment
#> 1 Ages 12+ by Age Bracket, Administered by All Providers
#> 2 Ages 12+ by Age Bracket, Administered by All Providers
#> 3 Ages 12+ by Age Bracket, Administered by All Providers
#> 4 Ages 12+ by Age Bracket, Administered by All Providers
#> 5      Ages 12+ by Age Bracket, Administered by DPH Only
#> 6 Ages 12+ by Age Bracket, Administered by All Providers
#>   administering_provider_type age_group demographic_group demographic_subgroup
#> 1               All Providers       12+       Age Bracket                25-34
#> 2               All Providers       12+       Age Bracket                12-17
#> 3               All Providers       12+       Age Bracket                65-74
#> 4               All Providers       12+       Age Bracket                18-24
#> 5                    DPH Only       12+       Age Bracket                12-17
#> 6               All Providers       12+       Age Bracket                35-44
#>   demographic_subgroup_sort_order total_1st_doses total_2nd_doses
#> 1                               5          150076          136628
#> 2                               3           36812           33979
#> 3                               9           75648           71846
#> 4                               4           51456           46027
#> 5                               3            5416            5256
#> 6                               6          119776          111380
#>   total_single_doses total_recipients total_series_completed
#> 1              14362           164830                 150990
#> 2                 18            36852                  33997
#> 3               3326            79023                  75172
#> 4               4297            55853                  50324
#> 5                  3             5690                   5259
#> 6              10914           130886                 122294
#>   subgroup_population age_group_population          data_as_of
#> 1              204639               791131 2021-12-15 06:00:11
#> 2               33938               791131 2021-12-15 06:00:11
#> 3               74120               791131 2021-12-15 06:00:11
#> 4               62127               791131 2021-12-15 06:00:11
#> 5               33938               791131 2021-12-15 06:00:11
#> 6              138390               791131 2021-12-15 06:00:11
#>        data_loaded_at
#> 1 2021-12-15 08:15:07
#> 2 2021-12-15 08:15:07
#> 3 2021-12-15 08:15:07
#> 4 2021-12-15 08:15:07
#> 5 2021-12-15 08:15:07
#> 6 2021-12-15 08:15:07

str(covid19sf_vaccine_demo)
#> 'data.frame':    130 obs. of  15 variables:
#>  $ overall_segment                : chr  "Ages 12+ by Age Bracket, Administered by All Providers" "Ages 12+ by Age Bracket, Administered by All Providers" "Ages 12+ by Age Bracket, Administered by All Providers" "Ages 12+ by Age Bracket, Administered by All Providers" ...
#>  $ administering_provider_type    : chr  "All Providers" "All Providers" "All Providers" "All Providers" ...
#>  $ age_group                      : chr  "12+" "12+" "12+" "12+" ...
#>  $ demographic_group              : chr  "Age Bracket" "Age Bracket" "Age Bracket" "Age Bracket" ...
#>  $ demographic_subgroup           : chr  "25-34" "12-17" "65-74" "18-24" ...
#>  $ demographic_subgroup_sort_order: int  5 3 9 4 3 6 5 1 7 2 ...
#>  $ total_1st_doses                : int  150076 36812 75648 51456 5416 119776 20309 1443 96701 233922 ...
#>  $ total_2nd_doses                : int  136628 33979 71846 46027 5256 111380 19487 1217 90907 222600 ...
#>  $ total_single_doses             : int  14362 18 3326 4297 3 10914 1103 143 10860 16746 ...
#>  $ total_recipients               : int  164830 36852 79023 55853 5690 130886 21923 1589 107647 250875 ...
#>  $ total_series_completed         : int  150990 33997 75172 50324 5259 122294 20590 1360 101767 239346 ...
#>  $ subgroup_population            : int  204639 33938 74120 62127 33938 138390 204639 1319 115527 275992 ...
#>  $ age_group_population           : int  791131 791131 791131 791131 791131 791131 791131 791131 791131 791131 ...
#>  $ data_as_of                     : POSIXct, format: "2021-12-15 06:00:11" "2021-12-15 06:00:11" ...
#>  $ data_loaded_at                 : POSIXct, format: "2021-12-15 08:15:07" "2021-12-15 08:15:07" ...

Using this dataset, we can answer the following questions:

  • What is the proportion of San Francisco population that got vaccinated?
  • How many people got two doses of the vaccine (e.g., Pfizer or Moderna) vs single dose (J&J)?
  • What is the distribution of the vaccine doses given to San Franciscans by demographics groups (age and race)?

Doses given to San Franciscans by age group:

In the following example, we will explore the distribution of San Francisco population by age group. Let’s start by filtering the data by all providers and age group demographics:

library(dplyr)
library(plotly)

df_age <- covid19sf_vaccine_demo %>%
  filter(administering_provider_type == "All Providers", 
         demographic_group == "Age Bracket",
         age_group == "All")

df_age
#>                                           overall_segment
#> 1  All Ages by Age Bracket, Administered by All Providers
#> 2  All Ages by Age Bracket, Administered by All Providers
#> 3  All Ages by Age Bracket, Administered by All Providers
#> 4  All Ages by Age Bracket, Administered by All Providers
#> 5  All Ages by Age Bracket, Administered by All Providers
#> 6  All Ages by Age Bracket, Administered by All Providers
#> 7  All Ages by Age Bracket, Administered by All Providers
#> 8  All Ages by Age Bracket, Administered by All Providers
#> 9  All Ages by Age Bracket, Administered by All Providers
#> 10 All Ages by Age Bracket, Administered by All Providers
#>    administering_provider_type age_group demographic_group demographic_subgroup
#> 1                All Providers       All       Age Bracket                  0-4
#> 2                All Providers       All       Age Bracket                12-17
#> 3                All Providers       All       Age Bracket                25-34
#> 4                All Providers       All       Age Bracket                35-44
#> 5                All Providers       All       Age Bracket                45-54
#> 6                All Providers       All       Age Bracket                65-74
#> 7                All Providers       All       Age Bracket                18-24
#> 8                All Providers       All       Age Bracket                 5-11
#> 9                All Providers       All       Age Bracket                55-64
#> 10               All Providers       All       Age Bracket                  75+
#>    demographic_subgroup_sort_order total_1st_doses total_2nd_doses
#> 1                                1               0               0
#> 2                                3           36812           33979
#> 3                                5          150076          136628
#> 4                                6          119776          111380
#> 5                                7           96701           90907
#> 6                                9           75648           71846
#> 7                                4           51456           46027
#> 8                                2           24539           16573
#> 9                                8           87786           82804
#> 10                              10           52612           49633
#>    total_single_doses total_recipients total_series_completed
#> 1                   0                0                      0
#> 2                  18            36852                  33997
#> 3               14362           164830                 150990
#> 4               10914           130886                 122294
#> 5               10860           107647                 101767
#> 6                3326            79023                  75172
#> 7                4297            55853                  50324
#> 8                   0            24541                  16573
#> 9               10823            98636                  93626
#> 10               2185            54823                  51818
#>    subgroup_population age_group_population          data_as_of
#> 1                39650               874787 2021-12-15 06:00:11
#> 2                33938               874787 2021-12-15 06:00:11
#> 3               204639               874787 2021-12-15 06:00:11
#> 4               138390               874787 2021-12-15 06:00:11
#> 5               115527               874787 2021-12-15 06:00:11
#> 6                74120               874787 2021-12-15 06:00:11
#> 7                62127               874787 2021-12-15 06:00:11
#> 8                44006               874787 2021-12-15 06:00:11
#> 9               101483               874787 2021-12-15 06:00:11
#> 10               60907               874787 2021-12-15 06:00:11
#>         data_loaded_at
#> 1  2021-12-15 08:15:07
#> 2  2021-12-15 08:15:07
#> 3  2021-12-15 08:15:07
#> 4  2021-12-15 08:15:07
#> 5  2021-12-15 08:15:07
#> 6  2021-12-15 08:15:07
#> 7  2021-12-15 08:15:07
#> 8  2021-12-15 08:15:07
#> 9  2021-12-15 08:15:07
#> 10 2021-12-15 08:15:07

Next, we will sort the data by age group and convert the demographic_subgroup into an ordered factor:

df_age <- df_age %>%
  arrange(demographic_subgroup_sort_order) %>%
  mutate(age_ordered = factor(demographic_subgroup, levels = demographic_subgroup))

Let’s start by plotting the distribution of San Francisco vaccinated population using the following groups:

  • Two Doses - San Franciscans who received the two doses of either Pfizer or Moderna vaccine
  • Single Dose - San Franciscans who received the J&J single-dose vaccine
  • First Dose - San Franciscans who received the first dose out of two of either Pfizer or Moderna vaccine
  • Not Vaccinated - San Franciscans who did not receive any COVID-19 vaccine

Since the total_2nd_doses variable (received the second dose) is a subset of total_1st_doses (received only the first out of two doses), we will have to exclude it:

df_age <- df_age %>%
  mutate(only_1st_dose = total_1st_doses - total_2nd_doses)

Next, we will collapse and aggregate the data by vaccination status:

df_summary <- df_age %>%
  summarise(`First Dose` = sum(only_1st_dose),
            `Two Doses` = sum(total_2nd_doses),
            `Single Dose` = sum(total_single_doses),
            `Total Pop` = sum(subgroup_population)) %>%
  mutate(`Not Received Vaccine` = `Total Pop` - `First Dose` - `Two Doses` - `Single Dose`) %>%
  select(- `Total Pop`) %>%
  t() %>%
  as.data.frame() %>%
  select(count = V1)

df_summary$type <- rownames(df_summary)

head(df_summary)
#>                       count                 type
#> First Dose            55629           First Dose
#> Two Doses            639777            Two Doses
#> Single Dose           56785          Single Dose
#> Not Received Vaccine 122596 Not Received Vaccine

Using the df_summary, let’s plot the distribution of San Francisco vaccination status:

plot_ly(data = df_summary,
        labels = ~ type,
        values = ~ count,
        textposition = 'inside',
        textinfo = 'label+percent',
        insidetextfont = list(color = '#FFFFFF',
                              size = 16),
        marker = list(colors = c("rgb(35, 90, 122)", "rgb(128, 200, 215)",
                                 "rgb(74, 168, 192)", "rgb(225, 100, 74)"),
                      line = list(color = '#FFFFFF', width = 1)),
        type = "pie",
        showlegend = FALSE) %>%
  layout(title = "Distribution of COVID-19 Vaccine Doses Given to San Franciscans",
         font = list(family = "Arieal",
                     size = 18),
         margin = list(t = 80, b = 60)) %>%
  add_annotations(text = paste("- Including children under the age of 12",
                               "- The First and Second Doses refer to either Pfizer or Moderna vaccine",
                               "- Single Dose refers to J&J vaccine", sep = "<br>"),
                  font = list(family = "Arieal",
                     size = 14),
                  
                  x = -0.02,
                  y = -0.15,
                  align = "left",
                  xref = "paper",
                  yref = "paper",
                  showarrow = FALSE)

As can see in the plot above, about 80% of the SF population completed the vaccination process, and 6% received the first dose out of two. About 14% did not vaccinate. Note that the distribution includes children under the age of 12, which at this point are not vaccinated. If we want to get a better distribution of the vaccination effort of the SF population, let’s exclude children under the age of 12:

df_summary2 <- df_age %>%
  filter(!demographic_subgroup %in% c("0-4", "5-11")) %>%
  summarise(`First Dose` = sum(only_1st_dose),
            `Two Doses` = sum(total_2nd_doses),
            `Single Dose` = sum(total_single_doses),
            `Total Pop` = sum(subgroup_population)) %>%
  mutate(`Not Received Vaccine` = `Total Pop` - `First Dose` - `Two Doses` - `Single Dose`) %>%
  select(- `Total Pop`) %>%
  t() %>%
  as.data.frame() %>%
  select(count = V1)

df_summary2$type <- rownames(df_summary2)

plot_ly(data = df_summary2,
        labels = ~ type,
        values = ~ count,
        textposition = 'inside',
        textinfo = 'label+percent',
        insidetextfont = list(color = '#FFFFFF',
                              size = 16),
        marker = list(colors = c("rgb(35, 90, 122)", "rgb(128, 200, 215)",
                                 "rgb(74, 168, 192)", "rgb(225, 100, 74)"),
                      line = list(color = '#FFFFFF', width = 1)),
        type = "pie",
        showlegend = FALSE) %>%
  layout(title = "Distribution of COVID-19 Vaccine Doses Given to San Franciscans",
         font = list(family = "Arieal",
                     size = 18),
         margin = list(t = 80, b = 60)) %>%
  add_annotations(text = paste("- Excluding children under the age of 12",
                               "- The First and Second Doses refer to either Pfizer or Moderna vaccine",
                               "- Single Dose refers to J&J vaccine", sep = "<br>"),
                  font = list(family = "Arieal",
                     size = 14),
                  
                  x = -0.02,
                  y = -0.15,
                  align = "left",
                  xref = "paper",
                  yref = "paper",
                  showarrow = FALSE)

After excluding children under the age of 12 group, we can see from the plot above that 86% of the SF population (age 12 and above) completed the vaccination process, and 6% received the first dose.

Let’s now plot the total population that received the full doses of the vaccine (single for J&J and two for Pfizer and Moderna):

plot_ly(data = df_age) %>%
  add_trace(x = ~ age_ordered,
            y = ~ total_2nd_doses,
            type = "bar",
            name = "Pfizer or Moderna") %>%
  add_trace(x = ~ age_ordered,
            type = "bar",
            y = ~ total_single_doses,
            name = "J&J") %>%
  layout(title = "San Francisco Fully Vaccinated Population by Age and Vaccine Type",
         barmode = "stack",
         yaxis = list(title = "Population"),
         xaxis = list(title = "Age Group"))

Last but not least, we will use a bar plot to describe the distribution, in percentage, between the city population that fully, partial, and not vaccinated. We will pre-calculate the percentage break down for each age group:


d1a <- covid19sf_vaccine_demo %>%
  filter(administering_provider_type == "All Providers",
                demographic_group == "Age Bracket",
                age_group == "All") %>%
  mutate(only_1st = total_1st_doses - total_2nd_doses,
                not_vaccinated = subgroup_population - total_series_completed - only_1st,
                first_per = 100 * only_1st / subgroup_population,
                second_per = 100 * total_2nd_doses / subgroup_population,
                single_per = 100 * total_single_doses / subgroup_population,
                not_vaccinated_per = 100 * not_vaccinated / subgroup_population,
                complete_per = 100 - not_vaccinated_per - first_per) %>%
  arrange(demographic_subgroup_sort_order) %>%
  mutate(age = factor(demographic_subgroup, levels = demographic_subgroup)) %>%
  select(age, first_per, second_per, single_per, not_vaccinated_per, complete_per)


d1a
#>      age first_per second_per  single_per not_vaccinated_per complete_per
#> 1    0-4  0.000000    0.00000  0.00000000         100.000000      0.00000
#> 2   5-11 18.102077   37.66077  0.00000000          44.237149     37.66077
#> 3  12-17  8.347575  100.12081  0.05303789          -8.521421    100.17385
#> 4  18-24  8.738552   74.08534  6.91647754          10.259629     81.00182
#> 5  25-34  6.571572   66.76538  7.01821256          19.644838     73.78359
#> 6  35-44  6.066912   80.48269  7.88640798           5.563986     88.36910
#> 7  45-54  5.015278   78.68896  9.40039991           6.895358     88.08936
#> 8  55-64  4.909197   81.59396 10.66484042           2.832987     92.25782
#> 9  65-74  5.129520   96.93200  4.48731786          -6.548840    101.41932
#> 10   75+  4.891063   81.48981  3.58743658          10.031688     85.07725

Inspired from The Economist data visualization, for the following plot, we will use The Economist default color palette for bar plot:

# plot setting
font_size <- 22
opacity <- 0.95
background <- "rgb(225, 236, 242)"
complete_color <- paste("rgba(47, 106, 158,", opacity ,")", sep = "")
first_color <-  paste("rgba(100, 184, 206,", opacity ,")", sep = "")
not_vaccinated_color <- paste("rgba(162, 32, 61,", opacity ,")", sep = "")

# Plot
plot_ly(data = d1a,
        x = ~ complete_per,
        y = ~ age,
        type = 'bar', orientation = 'h',
        name = "Completed Two/Single Doses",
        marker = list(color = complete_color,
                      line = list(color = toRGB("gray50"), width = 1))) %>%
  add_trace(x = ~ first_per ,
            marker = list(color =first_color),
            name = "First Dose") %>%
  add_trace(x = ~ not_vaccinated_per ,
            marker = list(color = not_vaccinated_color),
            name = "Not Vaccinated") %>%
  layout(xaxis = list(title = "",
                      showgrid = TRUE,
                      showline = FALSE,
                      showticklabels = TRUE,
                      zeroline = TRUE,
                      tickwidth = 2,
                      tickcolor = toRGB("gray50"),
                      gridwidth = 2,
                      gridcolor = toRGB("gray50"),
                      side ="top",
                      # domain = c(0.1, 1),
                      ticksuffix = "%"),
         yaxis = list(title = "Age Group",
                      # domain = c(0, 0.8),
                      showgrid = FALSE,
                      showline = FALSE,
                      showticklabels = TRUE,
                      zeroline = FALSE),
         barmode = "stack",
         plot_bgcolor = background,
         paper_bgcolor = background,
         margin = list(l = 70, t = 100, b = 60),
         legend = list(orientation = 'h')) %>%
  add_annotations(text = "Source: San Francisco, Department of Public Health - Population Health Division",
                  font = list(size = 16,
                              family = "Ariel"),
                  align = "left",
                  x = 0,
                  y = - 0.08,
                  xref = "paper",
                  yref = "paper",
                  showarrow = FALSE) %>%
  add_annotations(text = "San Francisco COVID-19 Vaccination Distribution by Age Group",
                  y = 1.19,
                  x = -0.03,
                  font = list(family = "Ariel",
                              size =  24,
                              color = "black"),
                  yref = "paper",
                  xref = "paper",
                  align = "left",
                  valign = "middle",
                  showarrow = FALSE) 

San Francisco vaccination time series

The covid19sf_vaccine_demo_ts dataset is a time series format of the covid19sf_vaccine_demo dataset, and includes the following fields:

  • date_administered - Date vaccination administered
  • overall_segment -Segment (universe) of analysis. Unique combination of administering_provider_type, age_group, and demographic_group. Filter to a single option to derive meaningful totals.
  • administering_provider_type - Providers included in a given overall_segment. Two possible values: ‘All’ (including SF DPH) or ‘DPH Only’
  • age_group - Age range included in a given overall_segment
  • demographic_group - Type of demographic group included in a given overall_segment (e.g. Age, Race/Ethnicity)
  • demographic_subgroup - Specific demographic group counted in a given record (e.g. 16-24, Asian)
  • demographic_subgroup_sort_order - Numeric sort order for all demographic_subgroup. Convenient for maintaining consistent ordering across multiple data visualizations.
  • new_1st_doses - Count of 1st doses administered for vaccines that take two doses to complete
  • new_2nd_doses - Count of 2nd doses administered for vaccines that take two doses to complete
  • new_single_doses - Count of doses administered for vaccines that take one dose to complete
  • new_series_completed - Count of individuals newly fully vaccinated on a given day (given the 2nd dose of a two-dose vaccine or one dose of a single dose vaccine)
  • new_recipients - Count of individuals vaccinated (with any dose) for the first time according to CA’s records
  • cumulative_1st_doses - Cumulative total of 1st doses administered for vaccines that take two doses to complete
  • cumulative_2nd_doses - Cumulative total of 2nd doses administered for vaccines that take two doses to complete
  • cumulative_single_doses - Cumulative total of doses administered for vaccines that take one dose to complete
  • cumulative_series_completed - Cumulative total individuals fully vaccinated (given the 2nd dose of a two-dose vaccine or one dose of a single dose vaccine)
  • cumulative_recipients - Cumulative total individuals vaccinated (with any dose) according to CA’s records
  • subgroup_population - American Community Survey population estimates for given demographic_subgroup
  • age_group_population - American Community Survey population estimates for overall age_group
  • data_as_of - Timestamp for last update date in source system
  • data_loaded_at - Timestamp when the record (row) was most recently updated here in the Open Data Portal

Let’s load and review the dataset:

data("covid19sf_vaccine_demo_ts")

str(covid19sf_vaccine_demo_ts)
#> 'data.frame':    46195 obs. of  19 variables:
#>  $ date_administered              : POSIXct, format: "2021-08-12" "2021-08-13" ...
#>  $ overall_segment                : chr  "Ages 65+ by Race/Ethnicity, Administered by All Providers" "Ages 65+ by Race/Ethnicity, Administered by All Providers" "Ages 65+ by Race/Ethnicity, Administered by All Providers" "Ages 65+ by Race/Ethnicity, Administered by All Providers" ...
#>  $ administering_provider_type    : chr  "All Providers" "All Providers" "All Providers" "All Providers" ...
#>  $ age_group                      : chr  "65+" "65+" "65+" "65+" ...
#>  $ demographic_group              : chr  "Race/Ethnicity" "Race/Ethnicity" "Race/Ethnicity" "Race/Ethnicity" ...
#>  $ demographic_subgroup           : chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
#>  $ demographic_subgroup_sort_order: int  9 9 9 9 9 9 9 9 9 9 ...
#>  $ new_1st_doses                  : int  0 1 0 0 0 1 1 0 0 2 ...
#>  $ new_2nd_doses                  : int  0 0 0 0 0 1 1 2 0 0 ...
#>  $ new_single_doses               : int  1 2 0 0 3 0 0 0 2 0 ...
#>  $ new_series_completed           : int  1 2 0 0 3 1 1 2 2 0 ...
#>  $ new_recipients                 : int  1 3 0 0 3 1 1 0 2 2 ...
#>  $ cumulative_1st_doses           : int  999 1000 1000 1000 1000 1001 1002 1002 1002 1004 ...
#>  $ cumulative_2nd_doses           : int  774 774 774 774 774 775 776 778 778 778 ...
#>  $ cumulative_single_doses        : int  126 128 128 128 131 131 131 131 133 133 ...
#>  $ cumulative_series_completed    : int  900 902 902 902 905 906 907 909 911 911 ...
#>  $ cumulative_recipients          : int  1126 1129 1129 1129 1132 1133 1134 1134 1136 1138 ...
#>  $ subgroup_population            : int  0 0 0 0 0 0 0 0 0 0 ...
#>  $ age_group_population           : int  135027 135027 135027 135027 135027 135027 135027 135027 135027 135027 ...

unique(covid19sf_vaccine_demo_ts$demographic_group)
#> [1] "Race/Ethnicity" "Age Bracket"

This dataset enables us to review the vaccination progress of SF population by age and race/ethnicity group over time. Let’s review the vaccination of the SF population by race/ethnicity group:

df <- covid19sf_vaccine_demo_ts %>%
  dplyr::filter(overall_segment == "All Ages by Race/Ethnicity, Administered by All Providers") 

table(df$demographic_subgroup)
#> 
#>          American Indian or Alaska Native 
#>                                       363 
#>                                     Asian 
#>                                       366 
#>                 Black or African American 
#>                                       365 
#>           Hispanic or Latino/a, all races 
#>                                       365 
#>                              Multi-Racial 
#>                                       365 
#> Native Hawaiian or Other Pacific Islander 
#>                                       362 
#>                                Other Race 
#>                                       365 
#>                                   Unknown 
#>                                       365 
#>                                     White 
#>                                       365

df %>%
  plot_ly(x = ~ date_administered,
          y = ~ new_series_completed,
          color = ~ demographic_subgroup,
          type = "scatter",
          mode = "line") %>%
  layout(title = "San Francisco COVID-19 Daily Vaccination by Race/Ethnicity",
         yaxis = list(title = "Population"),
         xaxis = list(title = "Source: San Francisco, Department of Public Health - Population Health Division"),
         legend = list(x = 0, y = 1.1),
         margin = list(t = 90))

Similarly, we can plot the vaccination progress over time by race/ethnicity group. Note that there is some calculation error for Other Race demographic sub-group as the cumulative values of the group surpass the sub-group population. Therefore, we will exclude it from the calculation:

df %>%
  filter(demographic_subgroup != "Other Race") %>%
  plot_ly(x = ~ date_administered,
                  y = ~ 100 *cumulative_series_completed /subgroup_population,
                  color = ~ demographic_subgroup,
                  hoverinfo = "text",
                  text = ~ paste(demographic_subgroup, "<br>",
                                 "Date:", date_administered, "<br>",
                                 "Sub-Group Population:", subgroup_population, "<br>",
                                 "Completed Series:", cumulative_series_completed,
                                 paste("(", round(100 *cumulative_series_completed /subgroup_population) ,"%)", sep = "")),
                  type = "scatter",
                  mode = "line") %>%
  layout(title = "Proportion of San Francisco COVID-19 Vaccinated Population by Race/Ethnicity",
         yaxis = list(title = "Percentage", 
                      ticksuffix = "%"),
         xaxis = list(title = "Source: San Francisco, Department of Public Health - Population Health Division"),
         legend = list(x = 0.05, y = 0.95),
         margin = list(t = 90))