Scalable Utility-Aware Multiclass Calibration
By: Mahmoud Hegazy, Michael I. Jordan, Aymeric Dieuleveut
Potential Business Impact:
Makes AI predictions more trustworthy and useful.
Ensuring that classifiers are well-calibrated, i.e., their predictions align with observed frequencies, is a minimal and fundamental requirement for classifiers to be viewed as trustworthy. Existing methods for assessing multiclass calibration often focus on specific aspects associated with prediction (e.g., top-class confidence, class-wise calibration) or utilize computationally challenging variational formulations. In this work, we study scalable \emph{evaluation} of multiclass calibration. To this end, we propose utility calibration, a general framework that measures the calibration error relative to a specific utility function that encapsulates the goals or decision criteria relevant to the end user. We demonstrate how this framework can unify and re-interpret several existing calibration metrics, particularly allowing for more robust versions of the top-class and class-wise calibration metrics, and, going beyond such binarized approaches, toward assessing calibration for richer classes of downstream utilities.
Similar Papers
Evaluating the Quality of the Quantified Uncertainty for (Re)Calibration of Data-Driven Regression Models
Machine Learning (CS)
Makes AI predictions more trustworthy and reliable.
Aligning Evaluation with Clinical Priorities: Calibration, Label Shift, and Error Costs
Machine Learning (CS)
Helps doctors pick the best treatment for patients.
Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification
Machine Learning (Stat)
Keeps computer vision accurate over time.