Observational Multiplicity
By: Erin George, Deanna Needell, Berk Ustun
Potential Business Impact:
Helps AI make safer, more honest predictions.
Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.
Similar Papers
On Arbitrary Predictions from Equally Valid Models
Machine Learning (CS)
Helps doctors make better, more trustworthy patient diagnoses.
Systemizing Multiplicity: The Curious Case of Arbitrariness in Machine Learning
Machine Learning (CS)
Makes AI decisions fairer and more predictable.
Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications
Machine Learning (CS)
Makes machines predict failures more reliably.