Score: 2

RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

Published: October 16, 2025 | arXiv ID: 2510.14968v1

By: Mingxuan Yan , Yuping Wang , Zechun Liu and more

BigTech Affiliations: Meta

Potential Business Impact:

Helps robots learn tasks by watching videos.

Business Areas:

Image Recognition Data and Analytics, Software

To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

Robotics

Robots learn complex tasks without human help.

8 Nov 2025 0

89%

Learning a Thousand Tasks in a Day

Robotics

Teaches robots new tasks with just one example.

13 Nov 2025 1

88%

Hierarchical Task Decomposition for Execution Monitoring and Error Recovery: Understanding the Rationale Behind Task Demonstrations

Robotics

Robots learn to do tricky jobs by watching.

7 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

19 pages

RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

Helps robots learn tasks by watching videos.

Technical Abstract

Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills

Learning a Thousand Tasks in a Day

Hierarchical Task Decomposition for Execution Monitoring and Error Recovery: Understanding the Rationale Behind Task Demonstrations