Off Policy Lyapunov Stability in Reinforcement Learning
By: Sarvan Gill, Daniela Constantinescu
Potential Business Impact:
Makes robots learn safely and faster.
Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
Similar Papers
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Machine Learning (CS)
Makes smart robots safer by checking their moves.
Stability Enhancement in Reinforcement Learning via Adaptive Control Lyapunov Function
Machine Learning (CS)
Makes robots learn safely without breaking things.
A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
Systems and Control
Keeps smart machines from making dangerous mistakes.