Score: 0

Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion

Published: December 15, 2025 | arXiv ID: 2512.13090v1

By: Jebeom Chae , Junwoo Chang , Seungho Yeom and more

Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.

EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Robotics

Robots follow spoken instructions to do tasks.

17 Nov 2025 0

89%

Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models

Robotics

Robots learn new paths without retraining.

16 Oct 2025 1

89%

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning

CV and Pattern Recognition

Makes self-driving cars better at tricky situations.

4 Dec 2025 0

View PDF Login to Bookmark

Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion

Technical Abstract

EL3DD: Extended Latent 3D Diffusion for Language Conditioned Multitask Manipulation

Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning