Score: 0

Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations

Published: January 16, 2026 | arXiv ID: 2601.11460v1

By: Franziska Herbert , Vignesh Prasad , Han Liu and more

Potential Business Impact:

Teaches robots to do complex jobs by watching humans.

Business Areas:

Motion Capture Media and Entertainment, Video

Learning structured task representations from human demonstrations is essential for understanding long-horizon manipulation behaviors, particularly in bimanual settings where action ordering, object involvement, and interaction geometry can vary significantly. A key challenge lies in jointly capturing the discrete semantic structure of tasks and the temporal evolution of object-centric geometric relations in a form that supports reasoning over task progression. In this work, we introduce a semantic-geometric task graph-representation that encodes object identities, inter-object relations, and their temporal geometric evolution from human demonstrations. Building on this formulation, we propose a learning framework that combines a Message Passing Neural Network (MPNN) encoder with a Transformer-based decoder, decoupling scene representation learning from action-conditioned reasoning about task progression. The encoder operates solely on temporal scene graphs to learn structured representations, while the decoder conditions on action-context to predict future action sequences, associated objects, and object motions over extended time horizons. Through extensive evaluation on human demonstration datasets, we show that semantic-geometric task graph-representations are particularly beneficial for tasks with high action and object variability, where simpler sequence-based models struggle to capture task progression. Finally, we demonstrate that task graph representations can be transferred to a physical bimanual robot and used for online action selection, highlighting their potential as reusable task abstractions for downstream decision-making in manipulation systems.

Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation

Robotics

Robots learn to do tasks from watching.

13 Jan 2026 2

88%

Learning a Thousand Tasks in a Day

Robotics

Teaches robots new tasks with just one example.

13 Nov 2025 1

88%

RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation

Robotics

Robots learn new tasks faster with less practice.

12 Nov 2025 2

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

9 pages

Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations

Teaches robots to do complex jobs by watching humans.

Technical Abstract

Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation

Learning a Thousand Tasks in a Day

RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation