Score: 1

ClarifyMT-Bench: Benchmarking and Improving Multi-Turn Clarification for Conversational Large Language Models

Published: December 24, 2025 | arXiv ID: 2512.21120v1

By: Sichun Luo , Yi Huang , Mukai Li and more

Potential Business Impact:

Helps computers ask questions before answering.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) are increasingly deployed as conversational assistants in open-domain, multi-turn settings, where users often provide incomplete or ambiguous information. However, existing LLM-focused clarification benchmarks primarily assume single-turn interactions or cooperative users, limiting their ability to evaluate clarification behavior in realistic settings. We introduce \textbf{ClarifyMT-Bench}, a benchmark for multi-turn clarification grounded in a five-dimensional ambiguity taxonomy and a set of six behaviorally diverse simulated user personas. Through a hybrid LLM-human pipeline, we construct 6,120 multi-turn dialogues capturing diverse ambiguity sources and interaction patterns. Evaluating ten representative LLMs uncovers a consistent under-clarification bias: LLMs tend to answer prematurely, and performance degrades as dialogue depth increases. To mitigate this, we propose \textbf{ClarifyAgent}, an agentic approach that decomposes clarification into perception, forecasting, tracking, and planning, substantially improving robustness across ambiguity conditions. ClarifyMT-Bench establishes a reproducible foundation for studying when LLMs should ask, when they should answer, and how to navigate ambiguity in real-world human-LLM interactions.

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Computation and Language

Tests AI's ability to talk and learn.

13 Aug 2025 1

90%

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Computation and Language

Tests computers on talking and solving tricky problems.

13 Aug 2025 2

90%

Structured Uncertainty guided Clarification for LLM Agents

Computation and Language

Helps AI ask better questions to finish tasks.

11 Nov 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

12 pages

ClarifyMT-Bench: Benchmarking and Improving Multi-Turn Clarification for Conversational Large Language Models

Helps computers ask questions before answering.

Technical Abstract

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Structured Uncertainty guided Clarification for LLM Agents