Score: 1

Learning to Reason Efficiently with Discounted Reinforcement Learning

Published: October 27, 2025 | arXiv ID: 2510.23486v1

By: Alex Ayoub , Kavosh Asadi , Dale Schuurmans and more

Potential Business Impact:

Makes AI think shorter, faster, and just as smart.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large reasoning models (LRMs) often consume excessive tokens, inflating computational cost and latency. We challenge the assumption that longer responses improve accuracy. By penalizing reasoning tokens using a discounted reinforcement learning setup (interpretable as a small token cost) and analyzing Blackwell optimality in restricted policy classes, we encourage concise yet accurate reasoning. Experiments confirm our theoretical results that this approach shortens chains of thought while preserving accuracy.