Score: 1

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Published: April 30, 2025 | arXiv ID: 2505.00127v1

By: Jinyan Su , Jennifer Healey , Preslav Nakov and more

Potential Business Impact:

Makes AI give shorter, more accurate answers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) are increasingly optimized for long reasoning, under the assumption that more reasoning leads to better performance. However, emerging evidence suggests that longer responses can sometimes degrade accuracy rather than improve it. In this paper, we conduct a systematic empirical study of the relationship between reasoning length and answer correctness. We find that LLMs tend to overthink simple problems, generating unnecessarily long outputs, and underthink harder ones, failing to extend their reasoning when it is most needed. This indicates that models might misjudge problem difficulty and fail to calibrate their response length appropriately. Furthermore, we investigate the effects of length reduction with a preference optimization algorithm when simply preferring the shorter responses regardless of answer correctness. Experiments show that the generation length can be significantly reduced while maintaining acceptable accuracy. Our findings highlight generation length as a meaningful signal for reasoning behavior and motivate further exploration into LLMs' self-awareness in reasoning length adaptation.

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint

Artificial Intelligence

Makes smart computer answers faster when time is short.

19 Apr 2025 0

92%

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Computation and Language

Makes smart computer programs think faster, not waste words.

20 Mar 2025 1

91%

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Machine Learning (CS)

Makes AI think faster and smarter.

8 Jun 2025 1

View PDF Login to Bookmark

Country of Origin

🇦🇪 🇺🇸 United States, United Arab Emirates

Page Count

21 pages

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Makes AI give shorter, more accurate answers.

Technical Abstract

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs