Hallucinations Live in Variance
By: Aaron R. Flouro, Shawn P. Chadwick
Potential Business Impact:
Makes AI answers more trustworthy, not just right.
Benchmarks measure whether a model is correct. They do not measure whether a model is reliable. This distinction is largely academic for single-shot inference, but becomes critical for agentic AI systems, where a single rephrased prompt can trigger cascading failures in multi-step execution. Yet this form of instability is not captured by existing evaluations. Hallucinations live in variance: they arise when semantically equivalent prompts activate inconsistent internal pathways, producing divergent outputs. Consistent but incorrect outputs reflect bias or missing knowledge; confident guessing reflects calibration failure. Neither constitutes hallucination under this definition. When error is variance-dominated, reducing redundant pathways improves reliability without adding knowledge. We formalize this through Semantic Stability (SS), measured via Paraphrase Consistency (PC@k): generate k paraphrases, greedy decode each, compute mode agreement. SS is a diagnostic for variance-driven unreliability, not a method for improving correctness. We show that a dense Qwen3-0.6B agrees with itself only 23.8% of the time; at 32% sparsity, agreement jumps to 55.9%. A phase diagram reveals the sweet spot where variance reduction outpaces bias accumulation, and regimes where stability collapses onto wrong answers.
Similar Papers
Detecting Token-Level Hallucinations Using Variance Signals: A Reference-Free Approach
Computation and Language
Finds when AI makes up wrong answers.
Why Language Models Hallucinate
Computation and Language
Teaches AI to say "I don't know"
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning
Machine Learning (CS)
Makes AI admit when it doesn't know.