Forest Kernel Balancing Weights: Outcome-Guided Features for Causal Inference
By: Andy A. Shen , Eli Ben-Michael , Avi Feller and more
While balancing covariates between groups is central for observational causal inference, selecting which features to balance remains a challenging problem. Kernel balancing is a promising approach that first estimates a kernel that captures similarity across units and then balances a (possibly low-dimensional) summary of that kernel, indirectly learning important features to balance. In this paper, we propose forest kernel balancing, which leverages the underappreciated fact that tree-based machine learning models, namely random forests and Bayesian additive regression trees (BART), implicitly estimate a kernel based on the co-occurrence of observations in the same terminal leaf node. Thus, even though the resulting kernel is solely a function of baseline features, the selected nonlinearities and other interactions are important for predicting the outcome -- and therefore are important for addressing confounding. Through simulations and applied illustrations, we show that forest kernel balancing leads to meaningful computational and statistical improvement relative to standard kernel methods, which do not incorporate outcome information when learning features.
Similar Papers
Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies
Methodology
Makes studies fairer by using results to pick groups.
Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning
Machine Learning (Stat)
Finds how two things affect each other.
Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery
Machine Learning (Stat)
Finds groups who benefit most from treatments.