Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization
By: Abdullah Akgül , Gulcin Baykal , Manuel Haußmann and more
Potential Business Impact:
Helps robots learn faster when things change.
Continuous control of non-stationary environments is a major challenge for deep reinforcement learning algorithms. The time-dependency of the state transition dynamics aggravates the notorious stability problems of model-free deep actor-critic architectures. We posit that two properties will play a key role in overcoming non-stationarity in transition dynamics: (i) preserving the plasticity of the critic network, (ii) directed exploration for rapid adaptation to the changing dynamics. We show that performing on-policy reinforcement learning with an evidential critic provides both of these properties. The evidential design ensures a fast and sufficiently accurate approximation to the uncertainty around the state-value, which maintains the plasticity of the critic network by detecting the distributional shifts caused by the change in dynamics. The probabilistic critic also makes the actor training objective a random variable, enabling the use of directed exploration approaches as a by-product. We name the resulting algorithm as $\textit{ Evidential Proximal Policy Optimization (EPPO)}$ due to the integral role of evidential uncertainty quantification in both policy evaluation and policy improvement stages. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that our algorithm outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.
Similar Papers
A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks
Machine Learning (CS)
Teaches computers to make better decisions faster.
Evolutionary Policy Optimization
Machine Learning (CS)
Teaches robots to learn faster and better.
Exploration and Adaptation in Non-Stationary Tasks with Diffusion Policies
Artificial Intelligence
Teaches robots to learn new tasks quickly.