Score: 0

Forest Kernel Balancing Weights: Outcome-Guided Features for Causal Inference

Published: December 12, 2025 | arXiv ID: 2512.11751v1

By: Andy A. Shen , Eli Ben-Michael , Avi Feller and more

While balancing covariates between groups is central for observational causal inference, selecting which features to balance remains a challenging problem. Kernel balancing is a promising approach that first estimates a kernel that captures similarity across units and then balances a (possibly low-dimensional) summary of that kernel, indirectly learning important features to balance. In this paper, we propose forest kernel balancing, which leverages the underappreciated fact that tree-based machine learning models, namely random forests and Bayesian additive regression trees (BART), implicitly estimate a kernel based on the co-occurrence of observations in the same terminal leaf node. Thus, even though the resulting kernel is solely a function of baseline features, the selected nonlinearities and other interactions are important for predicting the outcome -- and therefore are important for addressing confounding. Through simulations and applied illustrations, we show that forest kernel balancing leads to meaningful computational and statistical improvement relative to standard kernel methods, which do not incorporate outcome information when learning features.

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Methodology

Makes studies fairer by using results to pick groups.

19 Nov 2025 0

85%

Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning

Machine Learning (Stat)

Finds how two things affect each other.

7 Nov 2025 0

85%

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery

Machine Learning (Stat)

Finds groups who benefit most from treatments.

6 Sep 2025 0

View PDF Login to Bookmark

Forest Kernel Balancing Weights: Outcome-Guided Features for Causal Inference

Technical Abstract

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery