GCHR : Goal-Conditioned Hindsight Regularization for Sample-Efficient Reinforcement Learning
By: Xing Lei , Wenyan Yang , Kaiqiang Ke and more
Potential Business Impact:
Teaches robots to learn faster from mistakes.
Goal-conditioned reinforcement learning (GCRL) with sparse rewards remains a fundamental challenge in reinforcement learning. While hindsight experience replay (HER) has shown promise by relabeling collected trajectories with achieved goals, we argue that trajectory relabeling alone does not fully exploit the available experiences in off-policy GCRL methods, resulting in limited sample efficiency. In this paper, we propose Hindsight Goal-conditioned Regularization (HGR), a technique that generates action regularization priors based on hindsight goals. When combined with hindsight self-imitation regularization (HSR), our approach enables off-policy RL algorithms to maximize experience utilization. Compared to existing GCRL methods that employ HER and self-imitation techniques, our hindsight regularizations achieve substantially more efficient sample reuse and the best performances, which we empirically demonstrate on a suite of navigation and manipulation tasks.
Similar Papers
Adaptable Hindsight Experience Replay for Search-Based Learning
Machine Learning (CS)
Finds math answers by trying and learning.
Learning to explore when mistakes are not allowed
Machine Learning (CS)
Teaches robots to try new things safely.
Goal-conditioned Hierarchical Reinforcement Learning for Sample-efficient and Safe Autonomous Driving at Intersections
Robotics
Teaches self-driving cars to avoid crashes.