VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models
By: Guochao Jiang , Wenfeng Feng , Guofeng Quan and more
Potential Business Impact:
Teaches AI math problems from easy to hard.
Policy-based reinforcement learning currently plays an important role in improving LLMs on mathematical reasoning tasks. However, existing rollout-based reinforcement learning methods (GRPO, DAPO, GSPO, etc.) fail to explicitly consider LLMs' learning ability for samples of different difficulty levels, which is contrary to the human cognitive process of mathematical reasoning tasks from easy to difficult. Intuitively, we find that the variance of the rollout group's reward in RLVR partly reflects the difficulty of the current sample for LLMs. Samples that are too easy or too difficult have a lower variance, while samples with moderate difficulty have a higher variance. Based on this, we propose VCRL, a curriculum reinforcement learning framework that dynamically controls the difficulty of training samples based on the variance of group rewards. Experiments on five mathematical benchmarks and two models reveal the advantages of VCRL over the current LLM RL baselines.
Similar Papers
Coupled Variational Reinforcement Learning for Language Model General Reasoning
Computation and Language
Makes AI think better to solve problems.
TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning
Machine Learning (CS)
Teaches robots many tasks much faster.
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
Machine Learning (CS)
Teaches robots new skills faster with smart advice.