Generative Video Motion Editing with 3D Point Tracks
By: Yao-Chih Lee , Zhoutong Zhang , Jiahui Huang and more
Potential Business Impact:
Edits videos by changing how things move.
Camera and object motions are central to a video's narrative. However, precisely editing these captured motions remains a significant challenge, especially under complex object movements. Current motion-controlled image-to-video (I2V) approaches often lack full-scene context for consistent video editing, while video-to-video (V2V) methods provide viewpoint changes or basic object translation, but offer limited control over fine-grained object motion. We present a track-conditioned V2V framework that enables joint editing of camera and object motion. We achieve this by conditioning a video generation model on a source video and paired 3D point tracks representing source and target motions. These 3D tracks establish sparse correspondences that transfer rich context from the source video to new motions while preserving spatiotemporal coherence. Crucially, compared to 2D tracks, 3D tracks provide explicit depth cues, allowing the model to resolve depth order and handle occlusions for precise motion editing. Trained in two stages on synthetic and real data, our model supports diverse motion edits, including joint camera/object manipulation, motion transfer, and non-rigid deformation, unlocking new creative potential in video editing.
Similar Papers
MotionV2V: Editing Motion in a Video
CV and Pattern Recognition
Changes how things move in videos.
Zero-shot 3D-Aware Trajectory-Guided image-to-video generation via Test-Time Training
CV and Pattern Recognition
Makes videos move exactly how you want.
Fast Multi-view Consistent 3D Editing with Video Priors
CV and Pattern Recognition
Changes 3D objects with simple text commands.