Score: 1

ReLaX: Reasoning with Latent Exploration for Large Reasoning Models

Published: December 8, 2025 | arXiv ID: 2512.07558v1

By: Shimin Zhang , Xianwei Chen , Yufan Shen and more

Potential Business Impact:

Helps AI learn better by watching its inner thoughts.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated remarkable potential in enhancing the reasoning capability of Large Reasoning Models (LRMs). However, RLVR often leads to entropy collapse, resulting in premature policy convergence and performance saturation. While manipulating token-level entropy has proven effective for promoting policy exploration, we argue that the latent dynamics underlying token generation encode a far richer computational structure for steering policy optimization toward a more effective exploration-exploitation tradeoff. To enable tractable analysis and intervention of the latent dynamics of LRMs, we leverage Koopman operator theory to obtain a linearized representation of their hidden-state dynamics. This enables us to introduce Dynamic Spectral Dispersion (DSD), a new metric to quantify the heterogeneity of the model's latent dynamics, serving as a direct indicator of policy exploration. Building upon these foundations, we propose Reasoning with Latent eXploration (ReLaX), a paradigm that explicitly incorporates latent dynamics to regulate exploration and exploitation during policy optimization. Comprehensive experiments across a wide range of multimodal and text-only reasoning benchmarks show that ReLaX significantly mitigates premature convergence and consistently achieves state-of-the-art performance.

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Machine Learning (CS)

Helps AI learn math better by keeping ideas.

3 Oct 2025 3

90%

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Artificial Intelligence

Makes AI smarter and better at solving problems.

4 Dec 2025 0

90%

Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning

Computation and Language

Teaches AI to learn better by watching its mistakes.

4 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Page Count

19 pages

ReLaX: Reasoning with Latent Exploration for Large Reasoning Models

Helps AI learn better by watching its inner thoughts.

Technical Abstract

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning