Safe Exploration via Policy Priors
By: Manuel Wendl , Yarden As , Manish Prajapat and more
Potential Business Impact:
Lets robots learn safely without crashing.
Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.
Similar Papers
Safe Reinforcement Learning with Minimal Supervision
Machine Learning (CS)
Teaches robots to learn safely with less data.
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations
Machine Learning (CS)
Helps self-driving cars learn to drive safely.
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations
Machine Learning (CS)
Helps self-driving cars learn to drive safely.