Score: 2

Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach

Published: May 8, 2025 | arXiv ID: 2505.05126v3

By: Xuyang Chen, Keyu Yan, Lin Zhao

Potential Business Impact:

Helps robots learn better from past mistakes.

Business Areas:

Autonomous Vehicles Transportation

Offline reinforcement learning (RL) aims to learn decision-making policies from fixed datasets without online interactions, providing a practical solution where online data collection is expensive or risky. However, offline RL often suffers from distribution shift, resulting in inaccurate evaluation and substantial overestimation on out-of-distribution (OOD) actions. To address this, existing approaches incorporate conservatism by indiscriminately discouraging all OOD actions, thereby hindering the agent's ability to generalize and exploit beneficial ones. In this paper, we propose Advantage-based Diffusion Actor-Critic (ADAC), a novel method that systematically evaluates OOD actions using the batch-optimal value function. Based on this evaluation, ADAC defines an advantage function to modulate the Q-function update, enabling more precise assessment of OOD action quality. We design a custom PointMaze environment and collect datasets to visually reveal that advantage modulation can effectively identify and select superior OOD actions. Extensive experiments show that ADAC achieves state-of-the-art performance on almost all tasks in the D4RL benchmark, with particularly clear margins on the more challenging tasks.

Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning

Machine Learning (CS)

Teaches robots to learn from mistakes safely.

2 Apr 2025 0

89%

Imagination-Limited Q-Learning for Offline Reinforcement Learning

Machine Learning (CS)

Teaches robots to learn from past mistakes.

18 May 2025 0

88%

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning

Machine Learning (CS)

Helps robots learn better from past actions.

12 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇸🇬 Singapore

Repos / Data Links

github.com

Page Count

21 pages

Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach

Helps robots learn better from past mistakes.

Technical Abstract

Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning

Imagination-Limited Q-Learning for Offline Reinforcement Learning

Diffusion Policies with Value-Conditional Optimization for Offline Reinforcement Learning