Score: 1

VersaQ-3D: A Reconfigurable Accelerator Enabling Feed-Forward and Generalizable 3D Reconstruction via Versatile Quantization

Published: January 28, 2026 | arXiv ID: 2601.20317v1

By: Yipu Zhang , Jintao Cheng , Xingyu Liu and more

Potential Business Impact:

Makes 3D pictures from photos on phones.

Business Areas:

3D Technology Hardware, Software

The Visual Geometry Grounded Transformer (VGGT) enables strong feed-forward 3D reconstruction without per-scene optimization. However, its billion-parameter scale creates high memory and compute demands, hindering on-device deployment. Existing LLM quantization methods fail on VGGT due to saturated activation channels and diverse 3D semantics, which cause unreliable calibration. Furthermore, VGGT presents hardware challenges regarding precision-sensitive nonlinear operators and memory-intensive global attention. To address this, we propose VersaQ-3D, an algorithm-architecture co-design framework. Algorithmically, we introduce the first calibration-free, scene-agnostic quantization for VGGT down to 4-bit, leveraging orthogonal transforms to decorrelate features and suppress outliers. Architecturally, we design a reconfigurable accelerator supporting BF16, INT8, and INT4. A unified systolic datapath handles both linear and nonlinear operators, reducing latency by 60%, while two-stage recomputation-based tiling alleviates memory pressure for long-sequence attention. Evaluations show VersaQ-3D preserves 98-99% accuracy at W4A8. At W4A4, it outperforms prior methods by 1.61x-2.39x across diverse scenes. The accelerator delivers 5.2x-10.8x speedup over edge GPUs with low power, enabling efficient instant 3D reconstruction.

Quantized Visual Geometry Grounded Transformer

CV and Pattern Recognition

Makes 3D cameras faster and smaller.

25 Sep 2025 1

89%

SwiftVGGT: A Scalable Visual Geometry Grounded Transformer for Large-Scale Scenes

CV and Pattern Recognition

Builds detailed 3D maps much faster.

23 Nov 2025 1

89%

Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM

CV and Pattern Recognition

Helps robots see and understand moving things.

20 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 🇨🇳 China, Hong Kong

Page Count

14 pages

VersaQ-3D: A Reconfigurable Accelerator Enabling Feed-Forward and Generalizable 3D Reconstruction via Versatile Quantization

Makes 3D pictures from photos on phones.

Technical Abstract

Quantized Visual Geometry Grounded Transformer

SwiftVGGT: A Scalable Visual Geometry Grounded Transformer for Large-Scale Scenes

Building temporally coherent 3D maps with VGGT for memory-efficient Semantic SLAM