Non-parametric assessment of the calibration of individualized treatment effects
By: Mohsen Sadatsafavi , Jeroen Hoogland , Thomas P. A. Debray and more
An important aspect of the performance of algorithms that predict individualized treatment effects (ITE) is moderate calibration, i.e., the average treatment effect among individuals with predicted treatment effect of z being equal to z. The assessment of moderate calibration is a challenging task on two fronts: counterfactual responses are unobserved, and quantifying the conditional response function for models that generate continuous predicted values requires regularization or parametric modeling. Perhaps because of these challenges, there is currently no inferential method for the null hypothesis that an ITE model is moderately calibrated in a population. In this work, we propose non-parametric methods for the assessment of moderate calibration of ITE models for binary outcomes using data from a randomized trial. These methods simultaneously resolve both challenges, resulting in novel numerical, graphical, and inferential methods for the assessment of moderate calibration. The key idea is to formulate a stochastic process for the cumulative prediction errors that obeys a functional central limit theorem, enabling the use of the properties of Brownian motion for asymptotic inference. We propose two approaches to construct this process from a sample: a conditional approach that relies on predicted risks (often an output of ITE models), and a marginal approach based on replacing the cumulative conditional expected value and variance terms with their marginal counterparts. Numerical simulations confirm the desirable properties of both approaches and their ability to detect miscalibration of different forms. We use a case study to provide practical suggestions on graphical presentation and the interpretation of results. Moderate calibration of predicted ITEs can be assessed without requiring regularization techniques or making assumptions about the functional form of treatment response.
Similar Papers
Individual Treatment Effect: Prediction Intervals and Sharp Bounds
Methodology
Shows how well a treatment works for one person.
Calibrated Bayes analysis of cluster-randomized trials
Methodology
Makes study results more trustworthy, even with bad guesses.
Nonparametric Bayesian Calibration of Computer Models
Methodology
Improves computer predictions for science and engineering.