Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
By: Zachary Robertson, Sanmi Koyejo
Potential Business Impact:
Makes AI judge itself fairly, even without answers.
We study evaluation of AI systems without ground truth by exploiting a link between strategic gaming and information loss. We analyze which information-theoretic mechanisms resist adversarial manipulation, extending finite-sample bounds to show that bounded f-divergences (e.g., total variation distance) maintain polynomial guarantees under attacks while unbounded measures (e.g., KL divergence) degrade exponentially. To implement these mechanisms, we model the overseer as an agent and characterize incentive-compatible scoring rules as f-mutual information objectives. Under adversarial attacks, TVD-MI maintains effectiveness (area under curve 0.70-0.77) while traditional judge queries are near change (AUC $\approx$ 0.50), demonstrating that querying the same LLM for information relationships rather than quality judgments provides both theoretical and practical robustness. The mechanisms decompose pairwise evaluations into reliable item-level quality scores without ground truth, addressing a key limitation of traditional peer prediction. We release preregistration and code.
Similar Papers
Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
Machine Learning (CS)
Makes AI tell truth, not lies.
Beyond I-Con: Exploring New Dimension of Distance Measures in Representation Learning
Machine Learning (CS)
Finds better ways for computers to learn.
Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge
Machine Learning (CS)
Makes AI judges more trustworthy for picking best answers.