Score: 0

Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation

Published: December 22, 2025 | arXiv ID: 2512.18957v1

By: Debamita Ghosh, George K. Atia, Yue Wang

The deployment of reinforcement learning (RL) agents in real-world applications is often hindered by performance degradation caused by mismatches between training and deployment environments. Distributionally robust RL (DR-RL) addresses this issue by optimizing worst-case performance over an uncertainty set of transition dynamics. However, existing work typically relies on substantial prior knowledge-such as access to a generative model or a large offline dataset-and largely focuses on tabular methods that do not scale to complex domains. We overcome these limitations by proposing an online DR-RL algorithm with general function approximation that learns an optimal robust policy purely through interaction with the environment, without requiring prior models or offline data, enabling deployment in high-dimensional tasks. We further provide a theoretical analysis establishing a near-optimal sublinear regret bound under a total variation uncertainty set, demonstrating the sample efficiency and effectiveness of our method.

Provably Near-Optimal Distributionally Robust Reinforcement Learning in Online Settings

Machine Learning (CS)

Teaches robots to work safely in new places.

5 Aug 2025 0

90%

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Machine Learning (CS)

Teaches robots to learn safely in new places.

16 Oct 2025 0

90%

Data-regularized Reinforcement Learning for Diffusion Models at Scale

Machine Learning (CS)

Makes AI create better videos that people like.

3 Dec 2025 1

View PDF Login to Bookmark

Scaling Online Distributionally Robust Reinforcement Learning: Sample-Efficient Guarantees with General Function Approximation

Technical Abstract

Provably Near-Optimal Distributionally Robust Reinforcement Learning in Online Settings

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Data-regularized Reinforcement Learning for Diffusion Models at Scale