MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer
By: Penghui Liu , Jiangshan Wang , Yutong Shen and more
Potential Business Impact:
Lets videos copy other videos' movements perfectly.
Multi-object video motion transfer poses significant challenges for Diffusion Transformer (DiT) architectures due to inherent motion entanglement and lack of object-level control. We present MultiMotion, a novel unified framework that overcomes these limitations. Our core innovation is Maskaware Attention Motion Flow (AMF), which utilizes SAM2 masks to explicitly disentangle and control motion features for multiple objects within the DiT pipeline. Furthermore, we introduce RectPC, a high-order predictor-corrector solver for efficient and accurate sampling, particularly beneficial for multi-entity generation. To facilitate rigorous evaluation, we construct the first benchmark dataset specifically for DiT-based multi-object motion transfer. MultiMotion demonstrably achieves precise, semantically aligned, and temporally coherent motion transfer for multiple distinct objects, maintaining DiT's high quality and scalability. The code is in the supp.
Similar Papers
DM$^3$T: Harmonizing Modalities via Diffusion for Multi-Object Tracking
CV and Pattern Recognition
Helps cars see better in fog and dark.
Multivariate Diffusion Transformer with Decoupled Attention for High-Fidelity Mask-Text Collaborative Facial Generation
CV and Pattern Recognition
Creates realistic faces from masks and words.
MultiCOIN: Multi-Modal COntrollable Video INbetweening
CV and Pattern Recognition
Makes videos move exactly how you want.