Revisiting LLM Reasoning via Information Bottleneck
By: Shiye Lei , Zhihao Cheng , Kai Jia and more
Potential Business Impact:
Makes computers think better at math problems.
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR). By leveraging simple rule-based rewards, RL effectively incentivizes LLMs to produce extended chain-of-thought (CoT) reasoning trajectories, progressively guiding them toward correct answers. However, existing approaches remain largely heuristic and intuition-driven, limiting the development of principled methodologies. In this paper, we present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle, introducing IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable across diverse prompts. We derive a practical token-level surrogate objective and propose an efficient approximation, resulting in the lightweight IB regularization method. This technique integrates seamlessly into existing RL-based post-training frameworks without additional computational overhead, requiring only a one-line code modification. Empirically, we validate IB regularization across multiple mathematical reasoning benchmarks and RL algorithms, demonstrating consistent improvements in LLM reasoning performance.
Similar Papers
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Artificial Intelligence
Makes computers learn new tricks, but not really.
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models
Machine Learning (CS)
Helps small AI learn to think better.
Rectifying LLM Thought from Lens of Optimization
Computation and Language
Teaches computers to think smarter, not longer.