Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality
By: Shaocong Ma , Ziyi Chen , Yi Zhou and more
Potential Business Impact:
Makes robots learn safely, even when unsure.
The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does not generally hold in robust constrained RL, indicating that traditional primal-dual methods may fail to find optimal feasible policies. To overcome this limitation, we propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO), which operates directly on the primal problem without relying on dual formulations. We provide theoretical convergence guarantees under mild regularity assumptions, showing convergence to an approximately optimal feasible policy with iteration complexity matching the best-known lower bound when the uncertainty set diameter is controlled in a specific level. Empirical results in a grid-world environment validate the effectiveness of our approach, demonstrating that RRPO achieves robust and safe performance under model uncertainties while the non-robust method can violate the worst-case safety constraints.
Similar Papers
Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
Machine Learning (CS)
Teaches robots to learn safely in new places.
Deep Gaussian Process Proximal Policy Optimization
Machine Learning (CS)
Helps robots learn safely and explore better.
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Sound
Makes computer voices sound more real and emotional.