Score: 0

Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning

Published: December 5, 2025 | arXiv ID: 2512.05953v1

By: Yunhao Cao , Zubin Bhaumik , Jessie Jia and more

We introduce Correspondence-Oriented Imitation Learning (COIL), a conditional policy learning framework for visuomotor control with a flexible task representation in 3D. At the core of our approach, each task is defined by the intended motion of keypoints selected on objects in the scene. Instead of assuming a fixed number of keypoints or uniformly spaced time intervals, COIL supports task specifications with variable spatial and temporal granularity, adapting to different user intents and task requirements. To robustly ground this correspondence-oriented task representation into actions, we design a conditional policy with a spatio-temporal attention mechanism that effectively fuses information across multiple input modalities. The policy is trained via a scalable self-supervised pipeline using demonstrations collected in simulation, with correspondence labels automatically generated in hindsight. COIL generalizes across tasks, objects, and motion patterns, achieving superior performance compared to prior methods on real-world manipulation tasks under both sparse and dense specifications.

Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning

Robotics

Robots learn to do tasks smoothly from watching humans.

18 Nov 2025 0

88%

Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics

Robotics

Robots change how they move based on your words.

5 Sep 2025 0

88%

Adapting by Analogy: OOD Generalization of Visuomotor Policies via Functional Correspondence

Robotics

Robots learn to do new tasks with less training.

15 Jun 2025 0

View PDF Login to Bookmark

Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning

Technical Abstract

Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning

Imitation Learning Based on Disentangled Representation Learning of Behavioral Characteristics

Adapting by Analogy: OOD Generalization of Visuomotor Policies via Functional Correspondence