CamC2V: Context-aware Controllable Video Generation
By: Luis Denninger, Sina Mokhtarzadeh Azar, Juergen Gall
Potential Business Impact:
Makes videos from pictures with camera movement.
Recently, image-to-video (I2V) diffusion models have demonstrated impressive scene understanding and generative quality, incorporating image conditions to guide generation. However, these models primarily animate static images without extending beyond their provided context. Introducing additional constraints, such as camera trajectories, can enhance diversity but often degrade visual quality, limiting their applicability for tasks requiring faithful scene representation. We propose CamC2V, a context-to-video (C2V) model that integrates multiple image conditions as context with 3D constraints alongside camera control to enrich both global semantics and fine-grained visual details. This enables more coherent and context-aware video generation. Moreover, we motivate the necessity of temporal awareness for an effective context representation. Our comprehensive study on the RealEstate10K dataset demonstrates improvements in visual quality and camera controllability. We will publish our code upon acceptance.
Similar Papers
I2V3D: Controllable image-to-video generation with 3D guidance
CV and Pattern Recognition
Turns still pictures into moving videos with control.
Generative Video Motion Editing with 3D Point Tracks
CV and Pattern Recognition
Edits videos by changing how things move.
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
CV and Pattern Recognition
Creates videos of moving scenes from any angle.