Causal Inference with Missing Exposures, Missing Outcomes, and Dependence
By: Kirsten E. Landsiedel , Rachel Abbott , Atukunda Mucunguzi and more
Potential Business Impact:
Fixes health studies with missing info.
Missing data are ubiquitous in public health research. The missing-completely-at-random (MCAR) assumption is often unrealistic and can lead to meaningful bias when violated. The missing-at-random (MAR) assumption tends to be more reasonable, but guidance on conducting causal analyses under MAR is limited when there is missingness on multiple variables. We present a series of causal graphs and identification results to demonstrate the handling of missing exposures and outcomes in observational studies. For estimation and inference, we highlight the use of targeted minimum loss-based estimation (TMLE) with Super Learner to flexibly and robustly address confounding, missing data, and dependence. Our work is motivated by SEARCH-TB's investigation of the effect of alcohol consumption on the risk of incident tuberculosis (TB) infection in rural Uganda. This study posed notable challenges due to confounding, missingness on the exposure (alcohol use), missingness on the baseline outcome (defining who was at risk of TB), missingness on the outcome at follow-up (capturing who acquired TB), and clustering within households. Application to real data from SEARCH-TB highlighted the real-world consequences of the discussed methods. Estimates from TMLE suggested that alcohol use was associated with a 49% increase in the relative risk (RR) of incident TB infection (RR=1.49, 95%CI: 1.39-1.59). These estimates were notably larger and more precise than estimates from inverse probability weighting (RR=1.13, 95%CI: 1.00-1.27) and unadjusted, complete case analyses (RR=1.18, 95%CI: 0.89-1.57). Our work demonstrates the utility of causal models for describing the missing data mechanism and TMLE for flexible inference.
Similar Papers
Robustness intervals for competing risks analysis with causes of failure missing not at random
Methodology
Makes medical studies more trustworthy with missing data.
Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity
Methodology
Fixes studies with missing information for better results.
A Unified Framework for Inference with General Missingness Patterns and Machine Learning Imputation
Methodology
Fixes computer guesses when data is missing.