Score: 0

Observational Multiplicity

Published: July 30, 2025 | arXiv ID: 2507.23136v1

By: Erin George, Deanna Needell, Berk Ustun

Potential Business Impact:

Helps AI make safer, more honest predictions.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Many prediction tasks can admit multiple models that can perform almost equally well. This phenomenon can can undermine interpretability and safety when competing models assign conflicting predictions to individuals. In this work, we study how arbitrariness can arise in probabilistic classification tasks as a result of an effect that we call \emph{observational multiplicity}. We discuss how this effect arises in a broad class of practical applications where we learn a classifier to predict probabilities $p_i \in [0,1]$ but are given a dataset of observations $y_i \in \{0,1\}$. We propose to evaluate the arbitrariness of individual probability predictions through the lens of \emph{regret}. We introduce a measure of regret for probabilistic classification tasks, which measures how the predictions of a model could change as a result of different training labels change. We present a general-purpose method to estimate the regret in a probabilistic classification task. We use our measure to show that regret is higher for certain groups in the dataset and discuss potential applications of regret. We demonstrate how estimating regret promote safety in real-world applications by abstention and data collection.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Page Count
27 pages

Category
Computer Science:
Machine Learning (CS)