Score: 1

DreamerV3-XP: Optimizing exploration through uncertainty estimation

Published: October 24, 2025 | arXiv ID: 2510.21418v1

By: Lukas Bierling , Davide Pasero , Jan-Henrik Bertrand and more

Potential Business Impact:

Teaches robots to learn new skills much faster.

Business Areas:
A/B Testing Data and Analytics

We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.

Country of Origin
🇳🇱 Netherlands

Repos / Data Links

Page Count
5 pages

Category
Computer Science:
Machine Learning (CS)