Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion
By: Jebeom Chae , Junwoo Chang , Seungho Yeom and more
Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.
Similar Papers
EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation
Robotics
Robots follow spoken instructions to do tasks.
Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models
Robotics
Robots learn new paths without retraining.
dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning
CV and Pattern Recognition
Makes self-driving cars better at tricky situations.