Score: 0

On Entropy Control in LLM-RL Algorithms

Published: September 3, 2025 | arXiv ID: 2509.03493v1

By: Han Shen

Potential Business Impact:

Helps AI learn math by trying different answers.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

For RL algorithms, appropriate entropy control is crucial to their effectiveness. To control the policy entropy, a commonly used method is entropy regularization, which is adopted in various popular RL algorithms including PPO, SAC and A3C. Although entropy regularization proves effective in robotic and games RL conventionally, studies found that it gives weak to no gains in LLM-RL training. In this work, we study the issues of entropy bonus in LLM-RL setting. Specifically, we first argue that the conventional entropy regularization suffers from the LLM's extremely large response space and the sparsity of the optimal outputs. As a remedy, we propose AEnt, an entropy control method that utilizes a new clamped entropy bonus with an automatically adjusted coefficient. The clamped entropy is evaluated with the re-normalized policy defined on certain smaller token space, which encourages exploration within a more compact response set. In addition, the algorithm automatically adjusts entropy coefficient according to the clamped entropy value, effectively controlling the entropy-induced bias while leveraging the entropy's benefits. AEnt is tested in math-reasoning tasks under different base models and datasets, and it is observed that AEnt outperforms the baselines consistently across multiple benchmarks.

On Entropy Control in LLM-RL Algorithms

Machine Learning (CS)

Teaches AI to solve math problems better.

3 Sep 2025 0

90%

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

Machine Learning (CS)

Makes AI smarter by balancing learning and guessing.

13 Oct 2025 0

90%

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

Machine Learning (CS)

Makes AI better at solving hard math problems.

13 Oct 2025 0

View PDF Login to Bookmark

Page Count

20 pages

On Entropy Control in LLM-RL Algorithms

Helps AI learn math by trying different answers.

Technical Abstract

On Entropy Control in LLM-RL Algorithms

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning