Score: 0

AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation

Published: December 20, 2025 | arXiv ID: 2512.18396v1

By: Yulu Wu , Jiujun Cheng , Haowen Wang and more

Recent advances in Vision-Language-Action (VLA) and world-model methods have improved generalization in tasks such as robotic manipulation and object interaction. However, Successful execution of such tasks depends on large, costly collections of real demonstrations, especially for fine-grained manipulation of articulated objects. To address this, we present AOMGen, a scalable data generation framework for articulated manipulation which is instantiated from a single real scan, demonstration and a library of readily available digital assets, yielding photoreal training data with verified physical states. The framework synthesizes synchronized multi-view RGB temporally aligned with action commands and state annotations for joints and contacts, and systematically varies camera viewpoints, object styles, and object poses to expand a single execution into a diverse corpus. Experimental results demonstrate that fine-tuning VLA policies on AOMGen data increases the success rate from 0% to 88.7%, and the policies are tested on unseen objects and layouts.

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Robotics

Robots learn to grab and move things better.

7 Jan 2025 0

89%

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Robotics

Robots learn new tasks by imagining future outcomes.

7 Dec 2025 0

89%

MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

Robotics

Teaches robots to do tasks with two hands.

21 Oct 2025 0

View PDF Login to Bookmark

AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation

Technical Abstract

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation