Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes
By: Carlos Vélez García, Miguel Cazorla, Jorge Pomares
We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95\% success, strongly outperforming prior methods that peak at 68\%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.
Similar Papers
Efficient Robotic Policy Learning via Latent Space Backward Planning
Robotics
Robots plan tasks faster and better.
Latent Adaptive Planner for Dynamic Manipulation
Robotics
Robots learn to catch moving things like humans.
Latent Diffusion Planning for Imitation Learning
Robotics
Teaches robots to learn from less perfect examples.