Score: 0

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint

Published: April 19, 2025 | arXiv ID: 2504.14350v3

By: Yi Sun , Han Wang , Jiaqiang Li and more

Potential Business Impact:

Makes smart computer answers faster when time is short.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Recent work has demonstrated the remarkable potential of Large Language Models (LLMs) in test-time scaling. By making models think before answering, they are able to achieve much higher accuracy with extra inference computation. However, in many real-world scenarios, models are used under time constraints, where an answer should be given within a certain output length. It is unclear whether and how the reasoning ability of different LLMs remain effective under strict constraints. We take a first look at this problem by conducting an in-depth empirical study. Specifically, we test 30 LLMs on common reasoning datasets under a wide range of output length budgets, and we analyze the correlation between the inference accuracy and various properties including model type, model size, prompt style, etc. We also consider the mappings between token budgets and actual on-device latency budgets. The results have demonstrated several interesting findings regarding the budget-aware LLM reasoning ability that differ from the unconstrained situation, e.g. the optimal choices of either model size or prompt style change under different budgets. These findings offer timely evaluation to this area and practical guidance for users to deploy LLMs under real-world latency constraints.

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Computation and Language

Makes AI give shorter, more accurate answers.

30 Apr 2025 1

90%

Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty

Computation and Language

Makes AI think faster and smarter on tests.

12 Jun 2025 0

90%

Reasoning Capabilities and Invariability of Large Language Models

Computation and Language

Tests if computers can think logically.

1 May 2025 1

View PDF Login to Bookmark

Page Count

19 pages

An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint

Makes smart computer answers faster when time is short.

Technical Abstract

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty

Reasoning Capabilities and Invariability of Large Language Models