Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models
By: Shaoning Sun , Mingzhu Cai , Huang He and more
Potential Business Impact:
Makes AI better at learning and fixing mistakes.
Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while others like Llama yield limited improvements. Complementing data-centric approaches, we reveal that this disparity reflects a hidden structural property: \textbf{distributional clarity} in probability space. Through a three-stage analysis-from phenomenon to mechanism to interpretation-we uncover that RL-friendly models exhibit intra-class compactness and inter-class separation in their probability assignments to correct vs. incorrect responses. We quantify this clarity using the \textbf{Silhouette Coefficient} ($S$) and demonstrate that (1) high $S$ correlates strongly with RL performance; (2) low $S$ is associated with severe logic errors and reasoning instability. To confirm this property, we introduce a Silhouette-Aware Reweighting strategy that prioritizes low-$S$ samples during training. Experiments across six mathematical benchmarks show consistent improvements across all model families, with gains up to 5.9 points on AIME24. Our work establishes distributional clarity as a fundamental, trainable property underlying RL-Friendliness.
Similar Papers
Alignment as Distribution Learning: Your Preference Model is Explicitly a Language Model
Machine Learning (CS)
Makes AI better at following instructions.
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Machine Learning (CS)
Makes AI smarter without losing creative ideas.
Diversity or Precision? A Deep Dive into Next Token Prediction
Computation and Language
Makes AI smarter by teaching it to guess better.