Score: 1

LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Published: July 21, 2025 | arXiv ID: 2507.15758v2

By: Xingyu Wu , Yuchen Yan , Shangke Lyu and more

Potential Business Impact:

Makes smart computers solve problems faster and better.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.

Country of Origin
🇨🇳 China

Repos / Data Links

Page Count
17 pages

Category
Computer Science:
Artificial Intelligence