Score: 1

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Published: November 24, 2025 | arXiv ID: 2511.19319v1

By: Lingwei Dang , Zonghan Li , Juntong Li and more

Potential Business Impact:

Creates realistic 3D animations of people and objects.

Business Areas:

Motion Capture Media and Entertainment, Video

Hand-Object Interaction (HOI) generation plays a critical role in advancing applications across animation and robotics. Current video-based methods are predominantly single-view, which impedes comprehensive 3D geometry perception and often results in geometric distortions or unrealistic motion patterns. While 3D HOI approaches can generate dynamically plausible motions, their dependence on high-quality 3D data captured in controlled laboratory settings severely limits their generalization to real-world scenarios. To overcome these limitations, we introduce SyncMV4D, the first model that jointly generates synchronized multi-view HOI videos and 4D motions by unifying visual prior, motion dynamics, and multi-view geometry. Our framework features two core innovations: (1) a Multi-view Joint Diffusion (MJD) model that co-generates HOI videos and intermediate motions, and (2) a Diffusion Points Aligner (DPA) that refines the coarse intermediate motion into globally aligned 4D metric point tracks. To tightly couple 2D appearance with 4D dynamics, we establish a closed-loop, mutually enhancing cycle. During the diffusion denoising process, the generated video conditions the refinement of the 4D motion, while the aligned 4D point tracks are reprojected to guide next-step joint generation. Experimentally, our method demonstrates superior performance to state-of-the-art alternatives in visual realism, motion plausibility, and multi-view consistency.

SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

CV and Pattern Recognition

Makes robots move realistically with objects.

3 Jun 2025 1

90%

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

CV and Pattern Recognition

Makes one picture move and change like a video.

4 Dec 2025 1

90%

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects

CV and Pattern Recognition

Creates realistic human-object actions for computers.

18 Jun 2025 1

View PDF Login to Bookmark

Page Count

12 pages

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Creates realistic 3D animations of people and objects.

Technical Abstract

SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

GenHOI: Generalizing Text-driven 4D Human-Object Interaction Synthesis for Unseen Objects