Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning
By: Dushan N. Wadduwage, Dineth Jayakody, Leonidas Zimianitis
Potential Business Impact:
Tests if AI doctors learn real sickness signs.
Machine learning (ML) models show strong promise for new biomedical prediction tasks, but concerns about trustworthiness have hindered their clinical adoption. In particular, it is often unclear whether a model relies on true clinical cues or on spurious hierarchical correlations in the data. This paper introduces a simple yet broadly applicable trustworthiness test grounded in stochastic proof-by-contradiction. Instead of just showing high test performance, our approach trains and tests on spurious labels carefully permuted based on a potential outcomes framework. A truly trustworthy model should fail under such label permutation; comparable accuracy across real and permuted labels indicates overfitting, shortcut learning, or data leakage. Our approach quantifies this behavior through interpretable Fisher-style p-values, which are well understood by domain experts across medical and life sciences. We evaluate our approach on multiple new bacterial diagnostics to separate tasks and models learning genuine causal relationships from those driven by dataset artifacts or statistical coincidences. Our work establishes a foundation to build rigor and trust between ML and life-science research communities, moving ML models one step closer to clinical adoption.
Similar Papers
Trustworthiness Preservation by Copies of Machine Learning Systems
Logic in Computer Science
Checks if copied AI systems are still safe.
The Evaluation Gap in Medicine, AI and LLMs: Navigating Elusive Ground Truth & Uncertainty via a Probabilistic Paradigm
Artificial Intelligence
Makes AI answers more trustworthy by checking expert agreement.
I-trustworthy Models. A framework for trustworthiness evaluation of probabilistic classifiers
Machine Learning (Stat)
Tests if computer predictions are truly reliable.