Provably Safe Reinforcement Learning using Entropy Regularizer
By: Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu
We consider the problem of learning the optimal policy for Markov decision processes with safety constraints. We formulate the problem in a reach-avoid setup. Our goal is to design online reinforcement learning algorithms that ensure safety constraints with arbitrarily high probability during the learning phase. To this end, we first propose an algorithm based on the optimism in the face of uncertainty (OFU) principle. Based on the first algorithm, we propose our main algorithm, which utilizes entropy regularization. We investigate the finite-sample analysis of both algorithms and derive their regret bounds. We demonstrate that the inclusion of entropy regularization improves the regret and drastically controls the episode-to-episode variability that is inherent in OFU-based safe RL algorithms.
Similar Papers
Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization
Machine Learning (CS)
Makes robots safer when learning new tasks.
Online Optimization for Offline Safe Reinforcement Learning
Machine Learning (CS)
Teaches robots to do tasks safely and well.
Statistical analysis of Inverse Entropy-regularized Reinforcement Learning
Machine Learning (Stat)
Finds the *why* behind smart decisions.