GACO-CAD: Geometry-Augmented and Conciseness-Optimized CAD Model Generation from Single Image
By: Yinghui Wang, Xinyu Zhang, Peng Du
Potential Business Impact:
Turns drawings into 3D models for building.
Generating editable, parametric CAD models from a single image holds great potential to lower the barriers of industrial concept design. However, current multi-modal large language models (MLLMs) still struggle with accurately inferring 3D geometry from 2D images due to limited spatial reasoning capabilities. We address this limitation by introducing GACO-CAD, a novel two-stage post-training framework. It is designed to achieve a joint objective: simultaneously improving the geometric accuracy of the generated CAD models and encouraging the use of more concise modeling procedures. First, during supervised fine-tuning, we leverage depth and surface normal maps as dense geometric priors, combining them with the RGB image to form a multi-channel input. In the context of single-view reconstruction, these priors provide complementary spatial cues that help the MLLM more reliably recover 3D geometry from 2D observations. Second, during reinforcement learning, we introduce a group length reward that, while preserving high geometric fidelity, promotes the generation of more compact and less redundant parametric modeling sequences. A simple dynamic weighting strategy is adopted to stabilize training. Experiments on the DeepCAD and Fusion360 datasets show that GACO-CAD achieves state-of-the-art performance under the same MLLM backbone, consistently outperforming existing methods in terms of code validity, geometric accuracy, and modeling conciseness.
Similar Papers
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
CV and Pattern Recognition
Turns photos into 3D computer models.
GeoCAD: Local Geometry-Controllable CAD Generation
CV and Pattern Recognition
Changes computer designs using text instructions.
MiCADangelo: Fine-Grained Reconstruction of Constrained CAD Models from 3D Scans
CV and Pattern Recognition
Turns 3D scans into editable computer designs.