Text-to-3D Generation using Jensen-Shannon Score Distillation
By: Khoi Do, Binh-Son Hua
Potential Business Impact:
Creates better 3D pictures from words.
Score distillation sampling is an effective technique to generate 3D models from text prompts, utilizing pre-trained large-scale text-to-image diffusion models as guidance. However, the produced 3D assets tend to be over-saturating, over-smoothing, with limited diversity. These issues are results from a reverse Kullback-Leibler (KL) divergence objective, which makes the optimization unstable and results in mode-seeking behavior. In this paper, we derive a bounded score distillation objective based on Jensen-Shannon divergence (JSD), which stabilizes the optimization process and produces high-quality 3D generation. JSD can match well generated and target distribution, therefore mitigating mode seeking. We provide a practical implementation of JSD by utilizing the theory of generative adversarial networks to define an approximate objective function for the generator, assuming the discriminator is well trained. By assuming the discriminator following a log-odds classifier, we propose a minority sampling algorithm to estimate the gradients of our proposed objective, providing a practical implementation for JSD. We conduct both theoretical and empirical studies to validate our method. Experimental results on T3Bench demonstrate that our method can produce high-quality and diversified 3D assets.
Similar Papers
Rethinking Score Distilling Sampling for 3D Editing and Generation
CV and Pattern Recognition
Makes 3D models from text, and changes them.
Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching
CV and Pattern Recognition
Creates more varied and realistic 3D objects from text.
Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting
CV and Pattern Recognition
Makes 3D pictures from words more real.