DreamerV3-XP: Optimizing exploration through uncertainty estimation
By: Lukas Bierling , Davide Pasero , Jan-Henrik Bertrand and more
Potential Business Impact:
Teaches robots to learn new skills much faster.
We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.
Similar Papers
DreamerV3 for Traffic Signal Control: Hyperparameter Tuning and Performance
Machine Learning (CS)
Teaches computers to learn faster by imagining.
ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay
CV and Pattern Recognition
Helps robots learn to explore new places faster.
Meta-Reinforcement Learning with Discrete World Models for Adaptive Load Balancing
Machine Learning (CS)
Makes computers share work better and faster.