Learning to Reason Efficiently with Discounted Reinforcement Learning
By: Alex Ayoub , Kavosh Asadi , Dale Schuurmans and more
Potential Business Impact:
Makes AI think shorter, faster, and just as smart.
Large reasoning models (LRMs) often consume excessive tokens, inflating computational cost and latency. We challenge the assumption that longer responses improve accuracy. By penalizing reasoning tokens using a discounted reinforcement learning setup (interpretable as a small token cost) and analyzing Blackwell optimality in restricted policy classes, we encourage concise yet accurate reasoning. Experiments confirm our theoretical results that this approach shortens chains of thought while preserving accuracy.
Similar Papers
Concise Reasoning via Reinforcement Learning
Computation and Language
Makes AI give shorter, smarter answers.
Mitigating Overthinking through Reasoning Shaping
Computation and Language
Makes smart computers think less, solve problems better.
Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Artificial Intelligence
Saves computer power by skipping easy problems.