Score: 1

Self-Correcting Large Language Models: Generation vs. Multiple Choice

Published: November 12, 2025 | arXiv ID: 2511.09381v1

By: Hossein A. Rahmani , Satyapriya Krishna , Xi Wang and more

Potential Business Impact:

Helps computers fix their own mistakes better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models have recently demonstrated remarkable abilities to self-correct their responses through iterative refinement, often referred to as self-consistency or self-reflection. However, the dynamics of this self-correction mechanism may differ substantially depending on whether the model is tasked with open-ended text generation or with selecting the most appropriate response from multiple predefined options. In this paper, we conduct a systematic investigation of these two paradigms by comparing performance trends and error-correction behaviors across various natural language understanding and reasoning tasks, covering language models of different scales and families. Our experimental results reveal distinct patterns of improvement and failure modes: \textit{While open-ended generation often benefits from the flexibility of re-interpretation and compositional refinement, multiple-choice selection can leverage clearer solution boundaries but may be limited by the provided options}. This contrast also reflects the dual demands faced by emerging agentic LLM applications: effective agents must not only generate and refine open-ended plans or explanations, but also make reliable discrete choices when operating within constrained action spaces. Our findings, therefore, highlight that the design of self-correction mechanisms should take into account the interaction between task structure and output space, with implications for both knowledge-intensive reasoning and decision-oriented applications of LLMs.

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Computation and Language

Computers can fix their own math mistakes.

18 Jun 2025 0

90%

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Computation and Language

Computers trust their answers more when they explain them.

16 Jan 2025 2

89%

Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights

Computation and Language

AI writes better test questions for teachers.

5 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇬🇧 United Kingdom, United States

Page Count

20 pages

Self-Correcting Large Language Models: Generation vs. Multiple Choice

Helps computers fix their own mistakes better.

Technical Abstract

Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong

Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights