Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation
By: Debamita Ghosh, George K. Atia, Yue Wang
The deployment of reinforcement learning (RL) agents in real-world applications is often hindered by performance degradation caused by mismatches between training and deployment environments. Distributionally robust RL (DR-RL) addresses this issue by optimizing worst-case performance over an uncertainty set of transition dynamics. However, existing work typically relies on substantial prior knowledge-such as access to a generative model or a large offline dataset-and largely focuses on tabular methods that do not scale to complex domains. We overcome these limitations by proposing an online DR-RL algorithm with general function approximation that learns an optimal robust policy purely through interaction with the environment, without requiring prior models or offline data, enabling deployment in high-dimensional tasks. We further provide a theoretical analysis establishing a near-optimal sublinear regret bound under a total variation uncertainty set, demonstrating the sample efficiency and effectiveness of our method.
Similar Papers
Provably Near-Optimal Distributionally Robust Reinforcement Learning in Online Settings
Machine Learning (CS)
Teaches robots to work safely in new places.
Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation
Machine Learning (CS)
Teaches robots to learn safely in new places.
Data-regularized Reinforcement Learning for Diffusion Models at Scale
Machine Learning (CS)
Makes AI create better videos that people like.