Drive Any Mesh: 4D Latent Diffusion for Mesh Deformation from Video
By: Yahao Shi , Yang Liu , Yanmin Wu and more
Potential Business Impact:
Makes 3D models move like real things.
We propose DriveAnyMesh, a method for driving mesh guided by monocular video. Current 4D generation techniques encounter challenges with modern rendering engines. Implicit methods have low rendering efficiency and are unfriendly to rasterization-based engines, while skeletal methods demand significant manual effort and lack cross-category generalization. Animating existing 3D assets, instead of creating 4D assets from scratch, demands a deep understanding of the input's 3D structure. To tackle these challenges, we present a 4D diffusion model that denoises sequences of latent sets, which are then decoded to produce mesh animations from point cloud trajectory sequences. These latent sets leverage a transformer-based variational autoencoder, simultaneously capturing 3D shape and motion information. By employing a spatiotemporal, transformer-based diffusion model, information is exchanged across multiple latent frames, enhancing the efficiency and generalization of the generated results. Our experimental results demonstrate that DriveAnyMesh can rapidly produce high-quality animations for complex motions and is compatible with modern rendering engines. This method holds potential for applications in both the gaming and filming industries.
Similar Papers
Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video
CV and Pattern Recognition
Creates 3D models of moving things from videos.
Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Graphics
Makes 3D characters move like real people.
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
CV and Pattern Recognition
Creates realistic 3D videos from few camera angles.