Score: 1

EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models

Published: March 25, 2025 | arXiv ID: 2503.19369v2

By: Yufei Cai , Hu Han , Yuxiang Wei and more

Potential Business Impact:

Copies dance moves from one video to another.

Business Areas:
Motion Capture Media and Entertainment, Video

The progress on generative models has led to significant advances on text-to-video (T2V) generation, yet the motion controllability of generated videos remains limited. Existing motion transfer methods explored the motion representations of reference videos to guide generation. Nevertheless, these methods typically rely on sample-specific optimization strategy, resulting in high computational burdens. In this paper, we propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer. By leveraging a small set of synthetic paired motion transfer samples, EfficientMT effectively adapts a pretrained T2V model into a general motion transfer framework that can accurately capture and reproduce diverse motion patterns. Specifically, we repurpose the backbone of the T2V model to extract temporal information from reference videos, and further propose a scaler module to distill motion-related information. Subsequently, we introduce a temporal integration mechanism that seamlessly incorporates reference motion features into the video generation process. After training on our self-collected synthetic paired samples, EfficientMT enables general video motion transfer without requiring test-time optimization. Extensive experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability. Our code will be available https://github.com/PrototypeNx/EfficientMT.

Repos / Data Links

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition