Rethinking the Foundations for Continual Reinforcement Learning
By: Esraa Elelimy , David Szepesvari , Martha White and more
Potential Business Impact:
Teaches computers to learn forever, not just once.
In the traditional view of reinforcement learning, the agent's goal is to find an optimal policy that maximizes its expected sum of rewards. Once the agent finds this policy, the learning ends. This view contrasts with \emph{continual reinforcement learning}, where learning does not end, and agents are expected to continually learn and adapt indefinitely. Despite the clear distinction between these two paradigms of learning, much of the progress in continual reinforcement learning has been shaped by foundations rooted in the traditional view of reinforcement learning. In this paper, we first examine whether the foundations of traditional reinforcement learning are suitable for the continual reinforcement learning paradigm. We identify four key pillars of the traditional reinforcement learning foundations that are antithetical to the goals of continual learning: the Markov decision process formalism, the focus on atemporal artifacts, the expected sum of rewards as an evaluation metric, and episodic benchmark environments that embrace the other three foundations. We then propose a new formalism that sheds the first and the third foundations and replaces them with the history process as a mathematical formalism and a new definition of deviation regret, adapted for continual learning, as an evaluation metric. Finally, we discuss possible approaches to shed the other two foundations.
Similar Papers
Ergodic Risk Measures: Towards a Risk-Aware Foundation for Continual Reinforcement Learning
Machine Learning (CS)
Helps robots learn new things without forgetting old ones.
The Future of Continual Learning in the Era of Foundation Models: Three Key Directions
Machine Learning (CS)
Keeps AI smart and updated with new information.
Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change
Machine Learning (CS)
Helps robots learn new tricks without forgetting old ones.