Score: 1

3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs

Published: August 12, 2025 | arXiv ID: 2508.08821v1

By: Noor Ahmed , Cameron Braunstein , Steffen Eger and more

Potential Business Impact:

Makes computers build 3D shapes from words.

Recent Multi-Modal Large Language Models (MLLMs) have demonstrated strong capabilities in learning joint representations from text and images. However, their spatial reasoning remains limited. We introduce 3DFroMLLM, a novel framework that enables the generation of 3D object prototypes directly from MLLMs, including geometry and part labels. Our pipeline is agentic, comprising a designer, coder, and visual inspector operating in a refinement loop. Notably, our approach requires no additional training data or detailed user instructions. Building on prior work in 2D generation, we demonstrate that rendered images produced by our framework can be effectively used for image classification pretraining tasks and outperforms previous methods by 15%. As a compelling real-world use case, we show that the generated prototypes can be leveraged to improve fine-grained vision-language models by using the rendered, part-labeled prototypes to fine-tune CLIP for part segmentation and achieving a 55% accuracy improvement without relying on any additional human-labeled data.

Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

CV and Pattern Recognition

Lets computers build and change 3D objects with words.

17 Nov 2025 1

90%

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

CV and Pattern Recognition

Helps computers understand 3D spaces like humans.

23 Mar 2025 0

90%

MMPart: Harnessing Multi-Modal Large Language Models for Part-Aware 3D Generation

CV and Pattern Recognition

Builds 3D objects from pictures, showing their parts.

20 Sep 2025 1

View PDF Login to Bookmark

Page Count

10 pages

3DFroMLLM: 3D Prototype Generation only from Pretrained Multimodal LLMs

Makes computers build 3D shapes from words.

Technical Abstract

Part-X-MLLM: Part-aware 3D Multimodal Large Language Model

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

MMPart: Harnessing Multi-Modal Large Language Models for Part-Aware 3D Generation