Score: 0

Credence Calibration Game? Calibrating Large Language Models through Structured Play

Published: August 20, 2025 | arXiv ID: 2508.14390v1

By: Ke Fang, Tianyi Zhao, Lu Cheng

Potential Business Impact:

Makes AI tell you how sure it is.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

As Large Language Models (LLMs) are increasingly deployed in decision-critical domains, it becomes essential to ensure that their confidence estimates faithfully correspond to their actual correctness. Existing calibration methods have primarily focused on post-hoc adjustments or auxiliary model training; however, many of these approaches necessitate additional supervision or parameter updates. In this work, we propose a novel prompt-based calibration framework inspired by the Credence Calibration Game. Our method establishes a structured interaction loop wherein LLMs receive feedback based on the alignment of their predicted confidence with correctness. Through feedback-driven prompting and natural language summaries of prior performance, our framework dynamically improves model calibration. Extensive experiments across models and game configurations demonstrate consistent improvements in evaluation metrics. Our results highlight the potential of game-based prompting as an effective strategy for LLM calibration. Code and data are available at https://anonymous.4open.science/r/LLM-Calibration/.

Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models

Computation and Language

Makes AI tell you when it's sure or guessing.

4 Mar 2025 0

89%

Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models

Computation and Language

Makes AI more honest about what it knows.

3 Apr 2025 0

88%

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?

Computation and Language

Makes AI tell you when it's unsure.

28 Oct 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

11 pages

Credence Calibration Game? Calibrating Large Language Models through Structured Play

Makes AI tell you how sure it is.

Technical Abstract

Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models

Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models

CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?