On Corruption-Robustness in Performative Reinforcement Learning
By: Vasilis Pollatos, Debmalya Mandal, Goran Radanovic
Potential Business Impact:
Makes AI learn safely even with bad information.
In performative Reinforcement Learning (RL), an agent faces a policy-dependent environment: the reward and transition functions depend on the agent's policy. Prior work on performative RL has studied the convergence of repeated retraining approaches to a performatively stable policy. In the finite sample regime, these approaches repeatedly solve for a saddle point of a convex-concave objective, which estimates the Lagrangian of a regularized version of the reinforcement learning problem. In this paper, we aim to extend such repeated retraining approaches, enabling them to operate under corrupted data. More specifically, we consider Huber's $\epsilon$-contamination model, where an $\epsilon$ fraction of data points is corrupted by arbitrary adversarial noise. We propose a repeated retraining approach based on convex-concave optimization under corrupted gradients and a novel problem-specific robust mean estimator for the gradients. We prove that our approach exhibits last-iterate convergence to an approximately stable policy, with the approximation error linear in $\sqrt{\epsilon}$. We experimentally demonstrate the importance of accounting for corruption in performative RL.
Similar Papers
Independent Learning in Performative Markov Potential Games
Machine Learning (CS)
Makes AI agents learn better when they change the game.
Provably Sample-Efficient Robust Reinforcement Learning with Average Reward
Machine Learning (CS)
Helps computers learn better with less data.
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
Machine Learning (CS)
Protects learning robots from bad information.