Score: 0

On the use of cross-fitting in causal machine learning with correlated units

Published: January 15, 2026 | arXiv ID: 2601.10899v1

By: Salvador V. Balkus, Hasan Laith, Nima S. Hejazi

Potential Business Impact:

Makes computer learning fair, even with connected data.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

In causal machine learning, the fitting and evaluation of nuisance models are typically performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross-fitting as if study units were independent usually still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators typically have the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to eliminate correlation between folds.

Country of Origin
🇺🇸 United States

Page Count
15 pages

Category
Statistics:
Methodology