Score: 0

Selective Risk Certification for LLM Outputs via Information-Lift Statistics: PAC-Bayes, Robustness, and Skeleton Design

Published: September 16, 2025 | arXiv ID: 2509.12527v2

By: Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma

Potential Business Impact:

Helps computers admit when they don't know.

Business Areas:
A/B Testing Data and Analytics

Large language models frequently generate confident but incorrect outputs, requiring formal uncertainty quantification with abstention guarantees. We develop information-lift certificates that compare model probabilities to a skeleton baseline, accumulating evidence into sub-gamma PAC-Bayes bounds valid under heavy-tailed distributions. Across eight datasets, our method achieves 77.2\% coverage at 2\% risk, outperforming recent 2023-2024 baselines by 8.6-15.1 percentage points, while blocking 96\% of critical errors in high-stakes scenarios vs 18-31\% for entropy methods. Limitations include skeleton dependence and frequency-only (not severity-aware) risk control, though performance degrades gracefully under corruption.

Country of Origin
🇺🇸 United States

Page Count
31 pages

Category
Computer Science:
Machine Learning (CS)