Variational OOD State Correction for Offline Reinforcement Learning
By: Ke Jiang, Wen Jiang, Xiaoyang Tan
Potential Business Impact:
Teaches robots to stay in safe areas.
The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the offline MuJoCo and AntMaze suites.
Similar Papers
Guaranteeing Out-Of-Distribution Detection in Deep RL via Transition Estimation
Machine Learning (CS)
Helps robots know when they are lost.
Beyond Non-Expert Demonstrations: Outcome-Driven Action Constraint for Offline Reinforcement Learning
Machine Learning (CS)
Teaches robots to learn from mistakes safely.
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Machine Learning (CS)
Helps robots learn better from past mistakes.