Score: 0

Non-parametric assessment of the calibration of individualized treatment effects

Published: December 9, 2025 | arXiv ID: 2512.08140v1

By: Mohsen Sadatsafavi , Jeroen Hoogland , Thomas P. A. Debray and more

An important aspect of the performance of algorithms that predict individualized treatment effects (ITE) is moderate calibration, i.e., the average treatment effect among individuals with predicted treatment effect of z being equal to z. The assessment of moderate calibration is a challenging task on two fronts: counterfactual responses are unobserved, and quantifying the conditional response function for models that generate continuous predicted values requires regularization or parametric modeling. Perhaps because of these challenges, there is currently no inferential method for the null hypothesis that an ITE model is moderately calibrated in a population. In this work, we propose non-parametric methods for the assessment of moderate calibration of ITE models for binary outcomes using data from a randomized trial. These methods simultaneously resolve both challenges, resulting in novel numerical, graphical, and inferential methods for the assessment of moderate calibration. The key idea is to formulate a stochastic process for the cumulative prediction errors that obeys a functional central limit theorem, enabling the use of the properties of Brownian motion for asymptotic inference. We propose two approaches to construct this process from a sample: a conditional approach that relies on predicted risks (often an output of ITE models), and a marginal approach based on replacing the cumulative conditional expected value and variance terms with their marginal counterparts. Numerical simulations confirm the desirable properties of both approaches and their ability to detect miscalibration of different forms. We use a case study to provide practical suggestions on graphical presentation and the interpretation of results. Moderate calibration of predicted ITEs can be assessed without requiring regularization techniques or making assumptions about the functional form of treatment response.

Category
Statistics:
Methodology