Continual Policy Distillation from Distributed Reinforcement Learning Teachers
By: Yuxuan Li , Qijun He , Mingqi Yuan and more
Potential Business Impact:
Teaches computers to learn many things without forgetting.
Continual Reinforcement Learning (CRL) aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks while mitigating catastrophic forgetting. This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks. While various enhancement strategies for both aspects have been proposed, achieving scalable performance by directly applying RL to sequential task streams remains challenging. In this paper, we propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model. This design is motivated by the observation that RL excels at solving single tasks, while policy distillation -- a relatively stable supervised learning process -- is well aligned with large foundation models and multi-task learning. Moreover, a mixture-of-experts (MoE) architecture and a replay-based approach are employed to enhance the plasticity and stability of the continual policy distillation process. Extensive experiments on the Meta-World benchmark demonstrate that our framework enables efficient continual RL, recovering over 85% of teacher performance while constraining task-wise forgetting to within 10%.
Similar Papers
Demonstration-Guided Continual Reinforcement Learning in Dynamic Environments
Machine Learning (CS)
Teaches robots to learn new skills without forgetting.
Continual Reinforcement Learning for Cyber-Physical Systems: Lessons Learned and Open Challenges
Machine Learning (CS)
Teaches self-driving cars to learn new parking spots.
Continual Reinforcement Learning by Planning with Online World Models
Machine Learning (CS)
Keeps robots learning new tricks without forgetting old ones.