Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models
By: In Cho , Youngbeom Yoo , Subin Jeon and more
Potential Business Impact:
Makes 3D models smaller and faster to create.
Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresses point clouds into compact latent vectors via intermediate point patches. Second, our triplane-based decoder reconstructs dense triplanes from latent vectors instead of directly decoding neural fields, significantly reducing computational overhead of neural fields decoding. Finally, we propose uncertainty-guided token pruning, which allocates resources adaptively by skipping computations in simpler regions and improves the decoder efficiency. Experimental results demonstrate that COD-VAE achieves 16x compression compared to the baseline while maintaining quality. This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation. The code is available at https://github.com/join16/COD-VAE.
Similar Papers
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Machine Learning (CS)
Designs new molecules for medicine and materials.
Geometry-Preserving Encoder/Decoder in Latent Generative Models
Numerical Analysis
Makes AI create better pictures by understanding shapes.
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
CV and Pattern Recognition
Makes video creation much faster and cheaper.