Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL
By: Mahsa Bastankhah , Grace Liu , Dilip Arumugam and more
Potential Business Impact:
Teaches robots to explore safely without being told.
In this work, we take a first step toward elucidating the mechanisms behind emergent exploration in unsupervised reinforcement learning. We study Single-Goal Contrastive Reinforcement Learning (SGCRL), a self-supervised algorithm capable of solving challenging long-horizon goal-reaching tasks without external rewards or curricula. We combine theoretical analysis of the algorithm's objective function with controlled experiments to understand what drives its exploration. We show that SGCRL maximizes implicit rewards shaped by its learned representations. These representations automatically modify the reward landscape to promote exploration before reaching the goal and exploitation thereafter. Our experiments also demonstrate that these exploration dynamics arise from learning low-rank representations of the state space rather than from neural network function approximation. Our improved understanding enables us to adapt SGCRL to perform safety-aware exploration.
Similar Papers
Self-Supervised Goal-Reaching Results in Multi-Agent Cooperation and Exploration
Machine Learning (CS)
Robots learn to work together to reach goals.
Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
Machine Learning (CS)
Helps robots learn from mistakes, not just wins.
Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control
Machine Learning (CS)
Teaches robots to reach any goal.