Beyond the Visible: Disocclusion-Aware Editing via Proxy Dynamic Graphs
By: Anran Qi , Changjian Li , Adrien Bousseau and more
Potential Business Impact:
Makes videos from pictures with user control.
We address image-to-video generation with explicit user control over the final frame's disoccluded regions. Current image-to-video pipelines produce plausible motion but struggle to generate predictable, articulated motions while enforcing user-specified content in newly revealed areas. Our key idea is to separate motion specification from appearance synthesis: we introduce a lightweight, user-editable Proxy Dynamic Graph (PDG) that deterministically yet approximately drives part motion, while a frozen diffusion prior is used to synthesize plausible appearance that follows that motion. In our training-free pipeline, the user loosely annotates and reposes a PDG, from which we compute a dense motion flow to leverage diffusion as a motion-guided shader. We then let the user edit appearance in the disoccluded areas of the image, and exploit the visibility information encoded by the PDG to perform a latent-space composite that reconciles motion with user intent in these areas. This design yields controllable articulation and user control over disocclusions without fine-tuning. We demonstrate clear advantages against state-of-the-art alternatives towards images turned into short videos of articulated objects, furniture, vehicles, and deformables. Our method mixes generative control, in the form of loose pose and structure, with predictable controls, in the form of appearance specification in the final frame in the disoccluded regions, unlocking a new image-to-video workflow. Code will be released on acceptance. Project page: https://anranqi.github.io/beyondvisible.github.io/
Similar Papers
MotionV2V: Editing Motion in a Video
CV and Pattern Recognition
Changes how things move in videos.
DisCo3D: Distilling Multi-View Consistency for 3D Scene Editing
CV and Pattern Recognition
Changes 3D objects in pictures perfectly.
Make Your MoVe: Make Your 3D Contents by Adapting Multi-View Diffusion Models to External Editing
CV and Pattern Recognition
Edits 3D objects without messing up their shape.