Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning
By: Haoming Ye , Yunxiao Xiao , Cewu Lu and more
Potential Business Impact:
Robots learn to do new jobs by watching videos.
Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.
Similar Papers
Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching
Robotics
Turns pictures into robot instructions.
Using Large Language Models for Abstraction of Planning Domains - Extended Version
Artificial Intelligence
Helps AI plan better by simplifying complex tasks.
An End-to-end Planning Framework with Agentic LLMs and PDDL
Artificial Intelligence
Lets computers plan tasks from simple instructions.