MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation
By: Zhiqi Li , Wenhuan Li , Tengfei Wang and more
Potential Business Impact:
Builds many 3D objects faster with more parts.
Compositionality is critical for 3D object and scene generation, but existing part-aware 3D generation methods suffer from poor scalability due to quadratic global attention costs when increasing the number of components. In this work, we present MoCA, a compositional 3D generative model with two key designs: (1) importance-based component routing that selects top-k relevant components for sparse global attention, and (2) unimportant components compression that preserve contextual priors of unselected components while reducing computational complexity of global attention. With these designs, MoCA enables efficient, fine-grained compositional 3D asset creation with scalable number of components. Extensive experiments show MoCA outperforms baselines on both compositional object and scene generation tasks. Project page: https://lizhiqi49.github.io/MoCA
Similar Papers
MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention
CV and Pattern Recognition
Makes videos of people look real and stay the same.
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
Computation and Language
Helps AI understand mixed text and pictures better.
Inferring Compositional 4D Scenes without Ever Seeing One
CV and Pattern Recognition
Builds 3D worlds from videos, showing moving objects.