On the use of cross-fitting in causal machine learning with correlated units
By: Salvador V. Balkus, Hasan Laith, Nima S. Hejazi
Potential Business Impact:
Makes computer learning fair, even with connected data.
In causal machine learning, the fitting and evaluation of nuisance models are typically performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross-fitting as if study units were independent usually still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators typically have the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to eliminate correlation between folds.
Similar Papers
Conditional cross-fitting for unbiased machine-learning-assisted covariate adjustment in randomized experiments
Methodology
Makes study results more accurate with less data.
A Honest Cross-Validation Estimator for Prediction Performance
Machine Learning (Stat)
Improves how well computer predictions work.
Cross Validation for Correlated Data in Regression and Classification Models, with Applications to Deep Learning
Methodology
Improves computer learning when data is messy.