Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction
By: Yun Zhou , Yaoting Wang , Guangquan Jie and more
Potential Business Impact:
Makes 3D models from text and one picture.
SAM3D has garnered widespread attention for its strong 3D object reconstruction capabilities. However, a key limitation remains: SAM3D cannot reconstruct specific objects referred to by textual descriptions, a capability that is essential for practical applications such as 3D editing, game development, and virtual environments. To address this gap, we introduce Ref-SAM3D, a simple yet effective extension to SAM3D that incorporates textual descriptions as a high-level prior, enabling text-guided 3D reconstruction from a single RGB image. Through extensive qualitative experiments, we show that Ref-SAM3D, guided only by natural language and a single 2D view, delivers competitive and high-fidelity zero-shot reconstruction performance. Our results demonstrate that Ref-SAM3D effectively bridges the gap between 2D visual cues and 3D geometric understanding, offering a more flexible and accessible paradigm for reference-guided 3D reconstruction. Code is available at: https://github.com/FudanCVL/Ref-SAM3D.
Similar Papers
SAM 3D: 3Dfy Anything in Images
CV and Pattern Recognition
Turns flat pictures into 3D objects.
Disc3D: Automatic Curation of High-Quality 3D Dialog Data via Discriminative Object Referring
CV and Pattern Recognition
Makes 3D computer worlds talk and understand questions.
LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View Consistency
CV and Pattern Recognition
Turns words into 3D objects from pictures.