Provably Sample-Efficient Robust Reinforcement Learning with Average Reward
By: Zachary Roch , Chi Zhang , George Atia and more
Potential Business Impact:
Helps computers learn better with less data.
Robust reinforcement learning (RL) under the average-reward criterion is essential for long-term decision-making, particularly when the environment may differ from its specification. However, a significant gap exists in understanding the finite-sample complexity of these methods, as most existing work provides only asymptotic guarantees. This limitation hinders their principled understanding and practical deployment, especially in data-limited scenarios. We close this gap by proposing \textbf{Robust Halpern Iteration (RHI)}, a new algorithm designed for robust Markov Decision Processes (MDPs) with transition uncertainty characterized by $\ell_p$-norm and contamination models. Our approach offers three key advantages over previous methods: (1). Weaker Structural Assumptions: RHI only requires the underlying robust MDP to be communicating, a less restrictive condition than the commonly assumed ergodicity or irreducibility; (2). No Prior Knowledge: Our algorithm operates without requiring any prior knowledge of the robust MDP; (3). State-of-the-Art Sample Complexity: To learn an $\epsilon$-optimal robust policy, RHI achieves a sample complexity of $\tilde{\mathcal O}\left(\frac{SA\mathcal H^{2}}{\epsilon^{2}}\right)$, where $S$ and $A$ denote the numbers of states and actions, and $\mathcal H$ is the robust optimal bias span. This result represents the tightest known bound. Our work hence provides essential theoretical understanding of sample efficiency of robust average reward RL.
Similar Papers
Sample Complexity of Distributionally Robust Average-Reward Reinforcement Learning
Machine Learning (CS)
Helps robots learn tasks better and faster.
Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning
Machine Learning (CS)
Teaches computers to make good choices even with bad info.
On Corruption-Robustness in Performative Reinforcement Learning
Machine Learning (CS)
Makes AI learn safely even with bad information.