Worldwide Bayesian Causal Impact Analysis of Vaccine Administration on Deaths and Cases Associated with COVID-19: A BigData Analysis of 145 Countries

2021-11-15

Abstract

Policy makers and mainstream news anchors have promised the public that the COVID-19 vaccine rollout worldwide would reduce symptoms, and thereby cases and deaths associated with COVID-19. While this vaccine rollout is still in progress, there is a large amount of public data available that permits an analysis of the effect of the vaccine rollout on COVID-19 related cases and deaths. Has this public policy treatment produced the desired effect?

One manner to respond to this question can begin by implementing a Bayesian causal analysis comparing both pre- and post-treatment periods. This study analyzed publicly available COVID-19 data from OWID (Hannah Ritchie and Roser 2020Hannah Ritchie, Lucas Rodés-Guirao, Edouard Mathieu, and Max Roser. 2020. “Coronavirus Pandemic (COVID-19).” Our World in Data.) utlizing the R package CausalImpact (Brodersen et al. 2015Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.) to determine the causal effect of the administration of vaccines on two dependent variables that have been measured cumulatively throughout the pandemic: total deaths per million (\(y1\)) and total cases per million (\(y2\)). After eliminating all results from countries with p > 0.05, there were 128 countries for \(y1\) and 103 countries for \(y2\) to analyze in this fashion, comprising 145 unique countries in total (avg. p < 0.004).

Results indicate that the treatment (vaccine administration) has a strong and statistically significant propensity to causally increase the values in either \(y1\) or \(y2\) over and above what would have been expected with no treatment. \(y1\) showed an increase/decrease ratio of (+115/-13), which means 89.84% of statistically significant countries showed an increase in total deaths per million associated with COVID-19 due directly to the causal impact of treatment initiation. \(y2\) showed an increase/decrease ratio of (+105/-16) which means 86.78% of statistically significant countries showed an increase in total cases per million of COVID-19 due directly to the causal impact of treatment initiation. Causal impacts of the treatment on \(y1\) ranges from -19% to +19015% with an average causal impact of +463.13%. Causal impacts of the treatment on \(y2\) ranges from -46% to +12240% with an average causal impact of +260.88%. Hypothesis 1 Null can be rejected for a large majority of countries.

This study subsequently performed correlational analyses on the causal impact results, whose effect variables can be represented as \(y1.E\) and \(y2.E\) respectively, with the independent numeric variables of: days elapsed since vaccine rollout began (\(n1\)), total vaccination doses per hundred (\(n2\)), total vaccine brands/types in use (\(n3\)) and the independent categorical variables continent (\(c1\)), country (\(c2\)), vaccine variety (\(c3\)). All categorical variables showed statistically significant (avg. p: < 0.001) postive Wilcoxon signed rank values (\(y1.E\) \(V\):[\(c1\) 3.04; \(c2\): 8.35; \(c3\): 7.22] and \(y2.E\) \(V\):[\(c1\) 3.04; \(c2\): 8.33; \(c3\): 7.19]). This demonstrates that the distribution of \(y1.E\) and \(y2.E\) was non-uniform among categories. The Spearman correlation between \(n2\) and \(y2.E\) was the only numerical variable that showed statistically significant results (\(y2.E\) ~ \(n2\): \(\rho\): 0.34 CI95%[0.14, 0.51], p: 4.91e-04). This low positive correlation signifies that countries with higher vaccination rates do not have lower values for \(y2.E\), slightly the opposite in fact. Still, the specifics of the reasons behind these differences between countries, continents, and vaccine types is inconclusive and should be studied further as more data become available. Hypothesis 2 Null can be rejected for \(c1\), \(c2\), \(c3\) and \(n2\) and cannot be rejected for \(n1\), and \(n3\).

The statistically significant and overwhelmingly positive causal impact after vaccine deployment on the dependent variables total deaths and total cases per million should be highly worrisome for policy makers. They indicate a marked increase in both COVID-19 related cases and death due directly to a vaccine deployment that was originally sold to the public as the “key to gain back our freedoms.” The effect of vaccines on total cases per million and its low positive association with total vaccinations per hundred signifies a limited impact of vaccines on lowering COVID-19 associated cases. These results should encourage local policy makers to make policy decisions based on data, not narrative, and based on local conditions, not global or national mandates. These results should also encourage policy makers to begin looking for other avenues out of the pandemic aside from mass vaccination campaigns.

Some variables that could be included in future analyses might include vaccine lot by country, the degree of prevalence of previous antibodies against SARS-CoV or SARS-CoV-2 in the population before vaccine administration begins, and the Causal Impact of ivermectin on the same variables used in this study.

Untruth naturally afflicts historical information. There are various reasons that make this unavoidable. One of them is partisanship for opinions and schools. If the soul is impartial in receiving information, it devotes to that information the share of critical investigation the information deserves, and its truth or untruth thus becomes clear. However, if the soul is infected with partisanship for a particular opinion or sect, it accepts without a moment’s hesitation the information that is agreeable to it. Prejudice and partisanship obscure the critical faculty and preclude critical investigation. The result is that falsehoods are accepted and transmitted(Muhammad ibn Khaldun al-Hadrami 1379, 1–2Muhammad ibn Khaldun al-Hadrami, Abu Zayd ’Abd ar-Rahman ibn. 1379. Muqaddimat Ibn Khaldun Li-Kitab Al-’ibar Wa-Diwan Al-Mubtada’ Wa-Al-Khabar Fi Ayyam Al-’Arab Wa-Al-’Ajam Wa-Al-Barbar, Wa-Man ’asarahum Min Dhawi Al-Sultan Al-Akbar. Misr : al-Mataba’ah al-Bahiyah al-Misriyah. http://archive.org/details/McGillLibrary-rbsc_isl_kitab-al-ibar_muqaddimah_d167l2361900z-16098.).

Background

Policy makers and mainstream news anchors have promised the public that the vaccine deployment worldwide would reduce symptoms, and thereby cases and deaths. While this vaccine deployment is still in progress, there is a large amount of public data available that permits an analysis of the effect of the vaccine deployment on COVID related cases and deaths. Has public policy treatment produced the desired effect? Responding to this question can begin by implementing a Bayesian Causal analysis comparing both pre and post treatment periods.

This is an important question to respond to due to the vaccine mandates that are being implemented worldwide. People have a right to know if this broad public policy is achieving the desired results. While there are arguments to be made on both sides of this debate, the question of whether a deployment of COVID-19 gene therapy injections cause less death or cases from the virus in any significant way is a testable hypothesis given public data that is now available. With the debates raging over the effectiveness, legality, and ethics of these vaccine mandates, a way to continually monitor the effect between vaccine deployment and worldwide COVID-19 associated death and case rates seems an important contribution to this ongoing discussion.

Some previous work has been done on the correlation between vaccination percentage rates or vaccine type and new cases and deaths (Subramanian and Kumar 2021Subramanian, S. V., and Akhil Kumar. 2021. “Increases in COVID-19 Are Unrelated to Levels of Vaccination Across 68 Countries and 2947 Counties in the United States.” European Journal of Epidemiology, September. https://doi.org/10.1007/s10654-021-00808-7.), however this work has yet to prove conclusive or has only looked at correlation in a limited number of countries (Alhinai and Elsidig 2021Alhinai, Zaid A., and Nagi Elsidig. 2021. “Countries with Similar COVID-19 Vaccination Rates yet Divergent Outcomes: Are All Vaccines Created Equal?” International Journal of Infectious Diseases 110 (September): 258–60. https://doi.org/10.1016/j.ijid.2021.06.040.). Indeed, the correlation numbers are a clue where to look, but they do not determine chronological order and therefore do not determine causation. It could just as well be that high total deaths per million are associated with high vaccination rates simply because those countries with higher death rates may have had a more frightened population ready to take vaccinations, or it could be that countries with high vaccination rates also had high rates of recording new cases and deaths as “COVID-19 associated,” even when there may have been various other comorbidites present in the individual (Disease Control and Prevention 2020Disease Control, Centers for, and Prevention. 2020. “Weekly Updates by Select Demographic and Geographic Characteristics.” National Center for Health Statistics.). These factors make correlation an important metric, but more of a preliminary clue in the dark of where to look for causation. To find causation on the scale of BigData analysis we must focus on the causal impact of the effects before and after vaccine administration on as many countries in the world as possible, this study looks at 145 countries.

Research Question 1

Specific: Does the ‘beginning of COVID-19 gene therapy injections’ (\(X\)) have any statistically significant causal effect in decreasing or increasing total deaths per million (\(y1\)) or total cases per million (\(y2\)) associated with COVID-19?

Simplified: Does vaccine deployment cause less or more COVID-19 associated cases or death?

Hypothesis 1

H0: \(X\) has no statistically significant1 Not statistically significant means a negative or positive causal effect with p > 0.05 causal effect on \(Yx\).

HA: \(X\) has a statistically significant2 Statistically significant means a negative or positive causal effect with p < 0.05. causal effect on \(Yx\).

Research Question 2

Specific: Do the statistically significant results of the causal effect on total deaths per million (\(y1.E\)) or total cases per million (\(y2.E\)) correlate with any of the following independent variables:

Simplified: Are the results of Research Question 1 associated with either of the following variables? :

  1. the length of time vaccines have been in use in a country
  2. the number of vaccines they have administered
  3. the number of brands/types of vaccines they have administered
  4. the continent where the vaccines were administered
  5. the country where the vaccines were administered
  6. the types of vaccines they have administered

Hypothesis 2

H0: \(X\) has no statistically significant3 Not statistically significant means a correlation between (-0.3 to 0.3) or Wilcoxon \(V\) = 0 or p > 0.05 correlation with \(yx.E\).

HA: \(X\) has a statistically significant4 Statistically significant means a correlation (> 0.3 or < -0.3) or a positive Wilcoxon \(V\), and p < 0.05 correlation with \(yx.E\).

Methods

The methods and code to reproduce this study are as follows:

Obtain up to date COVID-19 data from Our World in Data (Hannah Ritchie and Roser 2020Hannah Ritchie, Lucas Rodés-Guirao, Edouard Mathieu, and Max Roser. 2020. “Coronavirus Pandemic (COVID-19).” Our World in Data.)

It is necessary to download the appropriate data sets including:

  1. The data from Hannah Ritchie and Roser (2020)Hannah Ritchie, Lucas Rodés-Guirao, Edouard Mathieu, and Max Roser. 2020. “Coronavirus Pandemic (COVID-19).” Our World in Data.5 The data from COVID-19 case and death data from Our World in Data is taken directly from the data provided by Johns Hopkins University. is updated daily and can be downloaded and converted to a data.frame using the R code in this paper, or can downloaded directly from here as a .csv: https://covid.ourworldindata.org/data/owid-covid-data.csv.

  2. It was also necessary to convert the data on vaccine types in use by each country (wikipedia 2021wikipedia. 2021. “List of COVID-19 Vaccine Authorizations.” https://en.wikipedia.org/w/index.php?title=List_of_COVID-19_vaccine_authorizations&oldid=1049981876.)6 The data for vaccine brands and their use in different countries comes from hundreds of sources that have been aggregated by Wikipedia into a comprehensive list to a .csv format, this was done with standard spreadsheet techniques in Google Sheets. This data set I have made publicly available here: https://docs.google.com/spreadsheets/d/1egKoaLyAt_9JoWKqr8uZDxw3xIj9J-Fu33S9sMRL4XQ/edit?usp=sharing

# Folder where data is stored, **CHANGE THIS AS NECESSARY**
data_folder <- file.path("./indices[USED]/")

#*** UNCOMMENT FOR NEW DATA ***
# Download most recent OWID COVID-19 data set 
#url <- "https://covid.ourworldindata.org/data/owid-covid-data.csv"
#name <- "owid-covid-data.csv"
#download.file(url = url, destfile = paste0(data_folder,name))

## Load the reports ##
# Set directory to location of CSV files first
setwd(data_folder)
# Read files without csv
filenames <- gsub("\\.csv$","", list.files(pattern="\\.csv$"))
# Load all files
for(i in filenames){
  assign(i, read.csv(paste(i, ".csv", sep=""),fileEncoding="latin1"))
}

Clean data and merge datasets

Once all the data is obtained it can be ‘cleaned’ for faster data processing. This involves removing unnecessary variables from the data sets and renaming columns so all variable names are in agreement. This can be done with the R code included in the supplmentary material for this report.

Run Causal Analysis for all dependent variables

The R package CausalImpact (Brodersen et al. 2015Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.) utilizes a robust series of Bayesian calculations along with predictor data sets to determine the likely trajectory of a trend line had a particular intervention not occurred and then calculates the difference between that projected trend line and the real data line. The authors of the package summarized this impressive set of calculations and its improvements upon previous methods as a,

“…method [that] generalises the widely used difference-in-differences approach to the time-series setting by explicitly modelling the counterfactual of a time series observed both before and after the intervention. It improves on existing methods in two respects: it provides a fully Bayesian time-series estimate for the effect; and it uses model averaging to construct the most appropriate synthetic control for modelling the counter factual” (Brodersen et al. 2015, 247Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.).

Effectively, this allows us to look at the past 12-16 months (each country is slightly different) before vaccine administration began, this is called the pre-intervention period, and utilize that data to project where \(y1\) (total deaths per million) and \(y2\) (total cases per million) would have been had the intervention of \(X\) (vaccine administration) not occurred, what the authors call a “counterfactual” (Brodersen et al. 2015, 248–49Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.). Utilizing this estimated counterfactual and the confidence level associated with that estimation we can then compare it with the actual data available and see if there is any difference. If the projected estimation is higher than the actual results, it will appear as a negative impact, while if the projected estimation is lower than the actual results, it will appear as a positive impact.

Another aspect of the CausalImpact package is the ability to add control variables that are combined into a “synthetic control” (Brodersen et al. 2015Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.) that closely mimic the trend line, but that are exogenous (i.e. have not been affected by the intervention); this allows for even more accurate predictions and normally a lower p-value and a lower standard deviation. To obtain control variable data sets the authors of CausalImpact recommend utilizing similar data that did not receive the same treatment, such as a different region or country, any trend line that can closely mimic or even be constant alongside the trend line one is testing (e.g. similar data from an area with no intervention, price charts, stock prices, temperature records, etc.). As there are virtually zero countries that have been unaffected by the vaccines, finding a set of control variables is difficult, though not impossible. Ultimately, this study chose to utilize the data of four countries in Africa (Burkina Faso, Chad, DRC, South Sudan) that were chosen specifically for their low average severity indices since vaccine administration began (i.e. low levels of mandatory mask wearing, social distancing, crowd limitations, travel restrictions, etc.) and for their low vaccination rates. These countries in the author’s estimation best represent a “natural” progression of the virus with limited vaccine intervention on par with most other nations, making them the least likely to display problems of endogeneity while still acting as valid control groups.

These carefully selected control variables as a synthetic control combined with the formula presented by Brodersen et al. (2015)Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788., which utilizes the data pre-intervention as part of its equation to predict the counterfactual, means that the other synthetic control effectively being utilized internally in the equation is a dynamic calculation for each country based on its own individual experience of COVID-19’s impacts on total deaths and cases per million sans vaccines for, as mentioned above, approximately 12-16 months.

As Brodersen et al. (2015)Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788. explained,

“The approach described in this paper inherits three main characteristics from the state-space paradigm. First, it allows us to flexibly accommodate different kinds of assumptions about the latent state and emission processes underlying the observed data, including local trends and seasonality. Second, we use a fully Bayesian approach to inferring the temporal evolution of counterfactual activity and incremental impact. One advantage of this is the flexibility with which posterior inferences can be summarised. Third, we use a regression component that precludes a rigid commitment to a particular set of controls by integrating out our posterior uncertainty about the influence of each predictor as well as our uncertainty about which predictors to include in the first place, which avoids overfitting” (Brodersen et al. 2015, 251Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.)

In other words, by utilizing the data for total deaths and cases per million from before vaccines existed and combining that with a synthetic control of countries largely non-participatory in the COVID-19 vaccine program, the R package CausalImpact is able to produce a high degree of certainty in the results.7 average p-value for \(y1.E\) = 0.0039783; average p-value for \(y2.E\) = 0.00375 The control countries’ average severity indices since vaccine administration and vaccination rates are listed in Table 1.

This analysis was carried out on all countries in the data set that met the following criteria: (1) more than five observations (i.e. dates), (2) a longer pre intervention period than post intervention period, and (3) no NAs in the observations necessary for calculation.

# CAUSAL ANALYSIS ---------------------------------------------------------
# Variables for Function

# Select data frame  
df0 <- df_owid
# Create a countries list
countries <- c(unique(df_owid$ISO))
# Folder where plots will be saved
plots.loc <- file.path("./plots/causalImpact/totalCases/run6/")

## DEPENDENT VARIABLES

###############################################################################
# Here you must declare your variables. It is necessary to re-declare variables
# for each y variable that you intend to test. The only variable necessary to 
# change is where it says ** CHANGE THIS **, otherwise leave the code as is.
###############################################################################

# Declare function name and variables
impactReports <- function(df0, countries, plots.loc) {

# Begin for loop
for (i in countries) {

# Declare Data Frame to Use with Time Series
df <- df0
df <- dplyr::filter(df, ISO == i) # Use ISO codes to loop every country

# Assure the Date column is in the correct format
dates <- as.character(df$date) # change to text
df$date = as.Date(dates, format="%Y-%m-%d") # change to date

# Treatment - Independent Variable
vacc <- dplyr::select(df,date,total_vaccinations_per_hundred) # INDEPENDENT VARIABLE
vacc <- unique(vacc) # Make sure there are no duplicate observations
vacc <- na.omit(vacc) # Make sure there are no observations with NA

# Dependent Variable

## Dependent Variable ** CHANGE THIS **
df <- dplyr::select(df,date,total_cases_per_million)

df <- unique(df) # only choose unique values
df <- na.omit(df) # remove all

## Dependent Variable ** CHANGE THIS **
dfy <- df$total_cases_per_million 

# Obtain length of Dependent Variable for time-series calculations
dfyN <- as.numeric(length(dfy))

# Add in data sets that will be used for creating the Synthetic Control data
# This paper ultimately used the four countries mentioned, however the research
# process also included analyzing the results of other potential control candidates.
# Thus, included here are four other countries that were analyzed in this fashion,
# their conclusion ultimately not being used because of lower confidence levels and/or
# higher standards of deviation suggesting a less reliable synthetic control model.
# This data is included here for full transparency and for any other researchers that
# may like to analyze these other potential control countries and their results. 

# NOT INCLUDED IN THIS PAPER, BUT INCLUDED IN FULL DATASET
#dfy1 <-  BDI$total_deaths_per_million[1:dfyN] # Burundi
#dfy3 <-  HTI$total_deaths_per_million[1:dfyN]# Haiti
#dfy5 <-  YEM$total_deaths_per_million[1:dfyN] # Yemen
#dfy7 <-  TZA$total_deaths_per_million[1:dfyN] # Tanzania

# SELECTED CONTROL COUNTRIES FOR USE IN CAUSAL IMPACT ANALYSIS
dfy2 <-  COD$total_cases_per_million[1:dfyN] # DRC
dfy4 <-  SSD$total_cases_per_million[1:dfyN] # South Sudan
dfy6 <-  TCD$total_cases_per_million[1:dfyN] # Chad
dfy8 <-  BFA$total_cases_per_million[1:dfyN] # Burkina Faso

# Combine data frames of control countries
dfy <- cbind(dfy,dfy2,dfy4,
             dfy6,dfy8)

# Remove any NA values
dfy <- na.omit(dfy)

# Obtain new length of data frame of the Dependent Variable
dfyN2 <- as.numeric(nrow(dfy))

# Create time-series from this dataframe
dfx <- df$date[1:dfyN2]  # Time Series   

# Make sure all data has more than 5 observations
if (length(dfx) < 5 | nrow(dfy) < 5 | nrow(vacc) < 5) { 
 
   # Announce stop and move to next if logic unfulfilled 
cat(paste0("###### stopping ",i," at step 2 #######")) 
  next
} else {

#Declare Data Frame Variables
df.ts <- dfx # Dates as characters
df.score <-dfy  # Y
df.time <- df$d # Day number

#Calculate length of data frame
days <- as.Date(max(df.ts)) - as.Date(min(df.ts)) # Find length of data frame
days <- as.numeric(days) # Make the string a number

#Declare time points from data  
time.points <- try(seq.Date(as.Date(min(df.ts)), by = 1, length.out = days))

#Declare time series variables
SCORE.Y <- ts(df.score) # Dependent Variable
TIME.C <- ts(df.time) # Time Series

#Bind them into test groups
test <- try(zoo(cbind(SCORE.Y, TIME.C), time.points))

# Announce start of next step
cat(paste0("###### starting ",i," step 3 #######")) 

df <- na.omit(vacc) # Remove any NAs

if (anyNA(df) == TRUE) {
  
cat(paste0("###### stopping ",i," at step 4 #######")) 
  next
  } else {
    
# Treatment Period 
treatmentS <- (min(df$date)) # start of treatment period (i.e. first vaccines administered)
treatmentE <- max(df$date) # end of treatment period (i.e. ongoing)

# Pre-Treatment Period 
preperS <- min(dfx) # start of pre-treatment period
preperE <- treatmentS - 1 # end of pre-treatment period
# Post-Treatment Period 
postperS <- (treatmentS) # start of treatment period (i.e. first vaccines administered)
postperE <- treatmentE # end of treatment period
# Use when you need an exact date
pre.period <- c(preperS,preperE) # declare pre period start and end dates
post.period <- c(postperS,postperE) # declare post period start and end dates

# assure pre-period is longer than post-period
if ((preperE - preperS) < (postperE - postperS) | anyNA(test) == TRUE) { 

   # Stop and move to next country if logic unfulfilled
  cat(paste0("###### stopping ",i," at step 5 #######")) 
  next
  
} else {
cat(paste0("###### calculating ",i,"#######")) # Announce beginning of calculation
impact <- try(CausalImpact(test, pre.period, post.period)) # Calculate Impact
Sys.sleep(1)
x <- "Total Vaccinations Per Million" # CHANGE ONLY IF YOU CHANGE X VARIABLE
y <- "Total Cases Per Million" # ** CHANGE THIS  FOR EACH Y VARIABLE**
cat(paste0("###### plotting ",i,"#######")) # Announce beginning of plotting
try(print(plot(impact) +     # Print the plot
  ggplot2::labs(
  title = paste0(i, # Title the plot
                ": Causal Impact Plot",
                " | ", 
                x,
                " effect on ",
                y),
  caption = 
paste("Source: Data collected from OWID, analyzed and plotted by Kyle Beattie using RStudio as of", 
      date())
) +
  theme_stata())) # Choose a theme
Sys.sleep(1) # Let plot be created
cat(paste0("###### saving plots and reports ",i," #######")) # Announce saving of plot
setwd(plots.loc) # Set the drive to the plots folder
try(savePlotAsImage(paste0(i,      # Save Plot As TITLE
                       "_:_Causal_Impact_Plot",
                       "_|_", 
                       x,
                       "_effect_on_",
                       y,
                       "_Report.png"), 
                format = "png", 
                width =  1920, # set ratio sizes
                height = 1080))
Sys.sleep(1) # Let plot be saved
try(summary(impact, "report")) # Produce Summary Report
try(summary(impact)) # Produce Summary Report 2
report<<-try(capture.output(summary(impact, "report"))) # Capture Report
report2<<-try(capture.output(summary(impact)))
try(cat(report, file = paste0(i,     # Write report to txt
                          ": Causal Impact Plot",
                          " | ", 
                          x,
                          " effect on ",
                          y,"Report.txt"), append = TRUE))

try(stargazer(report2, 
          summary = FALSE,
          type = "text",
          out = paste0(plots.loc,i,     # Write report to txt
                          ": Causal Impact Plot",
                          " | ", 
                          x,
                          " effect on ",
                          y,"Report2.txt")))
               
     #    }         
        }       
      }
    }
  }
}

# RUN FUNCTION ------------------------------------------------------------

# Run this function to produce all the plots and reports above, use try() to pass over errors
try(impactReports(df0,countries,plots.loc), silent = TRUE)

Extract Causal Analysis results and merge with main dataset

After completing the causal analysis on all countries and printing a report and plot for each country, the data was filtered and cleaned by removing all results with p > 0.05 and turning all causal impact results into both whole numbers (ex. y1.effect_percent for Japan = 48%) and decimals (ex. y1.effect_dec for Japan = 0.48), both representing the percentage of positive or negative impact in two different forms. This data was then integrated back with the original data frame for further analyses and data presentation.

# ANALYZE CAUSAL IMPACT OUTPUT -------------------------------------------------

############################# FUNCTION VARIABLES ############################
# loc_i = the location of text files for import (as file.path)
# loc_o = the location to output csv and txt files (as file.path)
# name = name for csv and txt documents (as character spaces ok)
# y = dependent variable (as character no spaces)
############################################################################ 

### Extract necessary data from the Causal Impact Analysis Reports ###

 dat.extr.CausalReports <- function(loc_i,loc_o,name,y) {
   
   ## Data location
   data_folder <- file.path(loc_i)
   # Set directory to location of CSV files first
   setwd(data_folder)
   # Read files without csv
   filenames <- gsub("\\.txt$","", list.files(pattern="\\.txt$"))
   # Create data frame first
   df <- data.frame(matrix(ncol = 2, nrow = 0))
   # Load all files
   for(i in filenames){
     df0 <- assign(i, readtext(paste(i, ".txt", sep="")))
     df <- rbind(df,df0)
   }
   
   ## ISO
   # Extract first three characters to obtain ISO
   df$ISO <- substr(df$doc_id, 1, 3) 
   
   ## p-value
   # Extract all three digit values
   df$p <- str_extract(df$text,"p = 0.\\d{3}") 
    # Extract all two digit values
   df$p2 <- str_extract(df$text, "p = 0.\\d{2}")
   # Replace NAs in three digit values with two digit values
   df$p[is.na(df$p)] <- df$p2[is.na(df$p)] 
   # Remove two digit column
   df <- dplyr::select(df, -p2) 
   # Remove p = sign and convert to numeric
   df$p <- gsub("p = ", "", df$p) %>% as.numeric() 
   # Remove all non-statistically significant observations
   df <- filter(df, p < 0.05 ) 
   
   ## % Effect
   # separate this part of the text
   df$effect <- str_extract(df$text, "the response variable showed .*?%") 
   # remove this part leaving only the %
   df$effect <- gsub("the response variable","",df$effect) 
   # remove this text or that
   df$effect <- gsub("^( showed an increase of | showed a decrease of )", "", df$effect) 
   # combine text to make an easy readout for the public
   df$effect_txt <- paste(df$ISO,": Vaccine effect on",y,df$effect) 
   
   ## Interval
   # Extract only % numbers between []
   df$interval <- str_extract(df$text, "\\[(\\+|\\-)\\d*%.*?\\]") 
   
   ## Effect as Decimal
   # Remove all symbols and convert to numeric
   df$effect_dec <- gsub("%","",df$effect) %>% as.numeric() 
   # Turn percentage to decimal
   df$effect_dec <- df$effect_dec / 100 
   
   ## Merge the effect changes and p values with a new data frame
   # Select only necessary columns
   df5 <- dplyr::select(df,ISO,p,effect,effect_dec) 
   setnames(df5, # Change columns names for the main data
            c("p","effect","effect_dec"),
            c(paste0(y,".p"),paste0(y,".effect"),paste0(y,".effect_dec")))
   
# Write Data Frame to CSV, change data frame, path, and name variables as necessary
write.csv(df5,                            
            paste0(loc_o,name,date(),".csv"), 
            na = "",   
            fileEncoding = "UTF-8")
   
# Write statistical table to txt
stargazer(df5,                 # Export txt
         summary = FALSE,
         type = "text",
         out = paste0(loc_o,name,date(),".txt"))
 }

# RUN FUNCTION
# Variables
loc_i <- file.path("./plots/causalImpact/totalCases/run6/")
loc_o <- file.path("./plots/causalImpact/data_output/")
name <- "CausalAnalysis-vaccines-by-totalCases"
y <- "y2"

# FUNCTION
dat.extr.CausalReports(loc_i,loc_o,name,y)

# Analyze output
df <- read.csv("./plots/causalImpact/data_output/CausalAnalysis-
               vaccines-by-totalCasesTue Oct  28 11:50:13 2021.csv") %>%
  as.data.frame()
cntP <- str_count(df$y2.effect,"\\+") # count number of positive occurrences
count(cntP) # results

Correlation Analysis of Causal Analysis Results

After integrating the results of the Causal Impact Analysis with the original data frame, Research Question 2 was addressed through follow-up correlational analyses. These were calculated with the resulting dependent variables from the Causal Impact Analysis (\(y1.E\): y1.effect_percent = effect of vaccine intervention on total deaths per million) and (\(y2.E\): y2.effect_percent = effect of vaccine intervention on total cases per million) utilizing ggstatsplot (Patil 2021Patil, Indrajeet. 2021. Visualizations with statistical details: The ’ggstatsplot’ approach.” Journal of Open Source Software 6 (61): 3167. https://doi.org/10.21105/joss.03167.) for all variables listed in Research Question 2.

Plot Results

Scatter plot and correlational matrix results were used to analyze the statistical significance of the correlation between the dependent variables and the independent numerical variables. The scatter plots and correlational matrices include the \(\rho\) (Spearman) scores and their respective p-values. Dot plots were utilized to show the distribution of the dependent variables with the independent categorical variables, they include the Wilcoxon signed rank value and respective p-value.

Materials

The data used in the following analysis comes from the two data sources mentioned above as well as the R packages that are listed at the end of this report. These results were produced using RStudio Version 1.4.1027 (RStudio Team 2020RStudio Team. 2020. RStudio: Integrated Development Environment for r. Boston, MA: RStudio, PBC. http://www.rstudio.com/.).

Results

After eliminating all results from countries with p > 0.05, there were 128 countries for \(y1\) and 103 countries for \(y2\) to analyze in this fashion (avg. p-value < 0.004), 145 unique countries in total. Results indicate that the treatment (vaccine administration) has a strong and statistically significant propensity to causally increase the values in either \(y1\) or \(y2\) over and above what would have been expected with no treatment. \(y1\) showed an increase/decrease ratio of (+115/-13), which means 89.84% of statistically significant countries showed an increase in total deaths per million associated with COVID-19 due directly to the causal impact of treatment initiation. \(y2\) showed an increase/decrease ratio of (+105/-16) which means 86.78% of statistically significant countries showed an increase in total cases per million of COVID-19 due directly to the causal impact of treatment initiation. Causal impacts of the treatment on \(y1\) ranges from -19% to +19015% with an average causal impact of +463.13%. Causal impacts of the treatment on \(y2\) ranges from -46% to +12240% with an average causal impact of +260.88%. Hypothesis 1 Null can be rejected for a large majority of countries.

Causal Impact Results

The results of the CausalImpact package were produced as both figures and as automatic report summaries for each country. All figures are included as attached PDFs to this report and all figures can be requested in high quality 1920x1280 .png files from the author free of charge. All report summaries are included as .txt files in the supplementary data to this report.

The following figures and report summaries represent a sample of more than 150 figures produced by this code and data. These examples highlight a variety of countries with decreases, average increases, and substantial increases in deaths and cases associated with COVID-19 as a causal impact of vaccine deployment. Included below select figures is also an example of the report summary that was produced by CausalImpact, these report summaries assist in explaining how to interpret each figure from a statistical perspective.

To read these figures one should analyze the three different graphs that are labeled on the right \(y\) axis as Original, Pointwise, and Cumulative. The Original graph represents the actual recorded data as the solid black line; the dashed blue line represents the counterfactual, the predicted trend line had the intervention of vaccine deployment not occurred; and the light blue fill around the counterfactual shows the degree of potential statistical variance, less light blue fill signifies a more accurate counterfactual. The moment of vaccine deployment varies between countries and is represented by the vertical gray dashed line. The Pointwise graph shows all of the positive or negative causal impacts by calculating the difference between the counterfactual and the recorded data. Finally, the Cumulative graph sums all of the positive or negative causal impacts since the intervention began to show an upward (positive), downward (negative) or neutral (near zero) causal impact.

y1.E: Total Causal Impact from Vaccine Administration on Total Deaths Per Million

Average Decreases

Vanuatu: -39% Vaccine Causal Impact on Total Deaths Per Million

Analysis report {CausalImpact (Brodersen et al. 2015Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.)}: During the post-intervention period, the response variable had an average value of approx. 3.18. By contrast, in the absence of an intervention, we would have expected an average response of 5.18. The 95% interval of this counterfactual prediction is [3.22, 7.16]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -2.00 with a 95% interval of [-3.98, -0.038]. For a discussion of the significance of this effect, see below. Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 467.46. By contrast, had the intervention not taken place, we would have expected a sum of 761.71. The 95% interval of this prediction is [473.06, 1051.98]. The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -39%. The 95% interval of this percentage is [-77%, -1%]. This means that the negative effect observed during the intervention period is statistically significant. If the experimenter had expected a positive effect, it is recommended to double-check whether anomalies in the control variables may have caused an overly optimistic expectation of what should have happened in the response variable in the absence of the intervention. The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.025). This means the causal effect can be considered statistically significant.

China: -20% Vaccine Causal Impact on Total Deaths Per Million

Analysis report {CausalImpact (Brodersen et al. 2015Brodersen, Kay H., Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L. Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” Annals of Applied Statistics 9: 247–74. https://doi.org/10.1214/14-aoas788.)}: During the post-intervention period, the response variable had an average value of approx. 3.21. In the absence of an intervention, we would have expected an average response of 4.01. The 95% interval of this counterfactual prediction is [3.21, 4.83]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -0.80 with a 95% interval of [-1.62, 0.0011]. For a discussion of the significance of this effect, see below. Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 773.57. Had the intervention not taken place, we would have expected a sum of 966.48. The 95% interval of this prediction is [773.30, 1164.21]. The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -20%. The 95% interval of this percentage is [-40%, +0%]. This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period. The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.026). This means the causal effect can be considered statistically significant.

New Zealand: -19% Vaccine Causal Impact on Total Deaths Per Million

Singapore: -17% Vaccine Causal Impact on Total Deaths Per Million

Average Increases

France: +28% Vaccine Causal Impact on Total Deaths Per Million