Score: 1

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Published: July 21, 2025 | arXiv ID: 2507.15758v2

By: Xingyu Wu , Yuchen Yan , Shangke Lyu and more

Potential Business Impact:

Makes smart computers solve problems faster and better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.

Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

Artificial Intelligence

Makes smart computers think less, faster.

22 May 2025 1

91%

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Computation and Language

Lets AI think just enough to be accurate.

6 Mar 2025 1

90%

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning

Artificial Intelligence

Lets computers use calculators for math problems.

8 Oct 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

17 pages

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Makes smart computers solve problems faster and better.

Technical Abstract

Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning