Score: 2

A Context-Aware Dual-Metric Framework for Confidence Estimation in Large Language Models

Published: August 1, 2025 | arXiv ID: 2508.00600v1

By: Mingruo Yuan, Shuyi Zhang, Ben Kao

BigTech Affiliations: Huawei

Potential Business Impact:

Helps AI know when it's right or wrong.

Accurate confidence estimation is essential for trustworthy large language models (LLMs) systems, as it empowers the user to determine when to trust outputs and enables reliable deployment in safety-critical applications. Current confidence estimation methods for LLMs neglect the relevance between responses and contextual information, a crucial factor in output quality evaluation, particularly in scenarios where background knowledge is provided. To bridge this gap, we propose CRUX (Context-aware entropy Reduction and Unified consistency eXamination), the first framework that integrates context faithfulness and consistency for confidence estimation via two novel metrics. First, contextual entropy reduction represents data uncertainty with the information gain through contrastive sampling with and without context. Second, unified consistency examination captures potential model uncertainty through the global consistency of the generated answers with and without context. Experiments across three benchmark datasets (CoQA, SQuAD, QuAC) and two domain-specific datasets (BioASQ, EduQG) demonstrate CRUX's effectiveness, achieving the highest AUROC than existing baselines.

CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering

Computation and Language

Helps doctors answer questions without expensive computers.

16 Oct 2025 0

88%

CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision

Computation and Language

Checks if online info is true before answering.

17 Jun 2025 2

88%

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal

Computation and Language

Makes AI know when it's wrong and stop.

1 Sep 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 🇭🇰 Hong Kong, China

Page Count

9 pages

A Context-Aware Dual-Metric Framework for Confidence Estimation in Large Language Models

Helps AI know when it's right or wrong.

Technical Abstract

CURE: Confidence-driven Unified Reasoning Ensemble Framework for Medical Question Answering

CrEst: Credibility Estimation for Contexts in LLMs via Weak Supervision

Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal