Score: 1

SADA: Safe and Adaptive Inference with Multiple Black-Box Predictions

Published: September 26, 2025 | arXiv ID: 2509.21707v1

By: Jiawei Shan, Yiming Dong, Jiwei Zhao

Potential Business Impact:

Improves AI learning with less perfect data.

Business Areas:
Predictive Analytics Artificial Intelligence, Data and Analytics, Software

Real-world applications often face scarce labeled data due to the high cost and time requirements of gold-standard experiments, whereas unlabeled data are typically abundant. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions with unknown quality while preserving valid statistical inference. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through experiments on both synthetic and benchmark datasets.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
30 pages

Category
Statistics:
Machine Learning (Stat)