MMPart: Harnessing Multi-Modal Large Language Models for Part-Aware 3D Generation
By: Omid Bonakdar, Nasser Mozayani
Potential Business Impact:
Builds 3D objects from pictures, showing their parts.
Generative 3D modeling has advanced rapidly, driven by applications in VR/AR, metaverse, and robotics. However, most methods represent the target object as a closed mesh devoid of any structural information, limiting editing, animation, and semantic understanding. Part-aware 3D generation addresses this problem by decomposing objects into meaningful components, but existing pipelines face challenges: in existing methods, the user has no control over which objects are separated and how model imagine the occluded parts in isolation phase. In this paper, we introduce MMPart, an innovative framework for generating part-aware 3D models from a single image. We first use a VLM to generate a set of prompts based on the input image and user descriptions. In the next step, a generative model generates isolated images of each object based on the initial image and the previous step's prompts as supervisor (which control the pose and guide model how imagine previously occluded areas). Each of those images then enters the multi-view generation stage, where a number of consistent images from different views are generated. Finally, a reconstruction model converts each of these multi-view images into a 3D model.
Similar Papers
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
CV and Pattern Recognition
Lets computers build and change 3D objects with words.
3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs
CV and Pattern Recognition
Makes computers build 3D shapes from words.
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
Robotics
Robots build complex objects from simple text ideas.