Score: 1

Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

Published: May 23, 2025 | arXiv ID: 2505.18298v1

By: Jinyan Su, Claire Cardie

Potential Business Impact:

Makes AI think faster without making mistakes.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have demonstrated strong reasoning abilities in mathematical tasks, often enhanced through reinforcement learning (RL). However, RL-trained models frequently produce unnecessarily long reasoning traces -- even for simple queries -- leading to increased inference costs and latency. While recent approaches attempt to control verbosity by adding length penalties to the reward function, these methods rely on fixed penalty terms that are hard to tune and cannot adapt as the model's reasoning capability evolves, limiting their effectiveness. In this work, we propose an adaptive reward-shaping method that enables LLMs to "think fast and right" -- producing concise outputs without sacrificing correctness. Our method dynamically adjusts the reward trade-off between accuracy and response length based on model performance: when accuracy is high, the length penalty increases to encourage faster length reduction; when accuracy drops, the penalty is relaxed to preserve correctness. This adaptive reward accelerates early-stage length reduction while avoiding over-compression in later stages. Experiments across multiple datasets show that our approach consistently and dramatically reduces reasoning length while largely maintaining accuracy, offering a new direction for cost-efficient adaptive reasoning in large-scale language models.

Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty

Computation and Language

Makes AI think faster and smarter on tests.

12 Jun 2025 0

92%

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Artificial Intelligence

Saves computer power by skipping easy problems.

5 Jun 2025 0

91%

Adaptive Deep Reasoning: Triggering Deep Thinking When Needed

Computation and Language

Smart AI picks short or long thinking for answers.

26 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

14 pages

Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

Makes AI think faster without making mistakes.

Technical Abstract

Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Adaptive Deep Reasoning: Triggering Deep Thinking When Needed