FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers
By: Yue Wu , Yufan Wu , Wen Li and more
Potential Business Impact:
Makes 3D avatars from photos in seconds.
Despite significant progress in 3D avatar reconstruction, it still faces challenges such as high time complexity, sensitivity to data quality, and low data utilization. We propose FastAvatar, a feedforward 3D avatar framework capable of flexibly leveraging diverse daily recordings (e.g., a single image, multi-view observations, or monocular video) to reconstruct a high-quality 3D Gaussian Splatting (3DGS) model within seconds, using only a single unified model. FastAvatar's core is a Large Gaussian Reconstruction Transformer featuring three key designs: First, a variant VGGT-style transformer architecture aggregating multi-frame cues while injecting initial 3D prompt to predict an aggregatable canonical 3DGS representation; Second, multi-granular guidance encoding (camera pose, FLAME expression, head pose) mitigating animation-induced misalignment for variable-length inputs; Third, incremental Gaussian aggregation via landmark tracking and sliced fusion losses. Integrating these features, FastAvatar enables incremental reconstruction, i.e., improving quality with more observations, unlike prior work wasting input data. This yields a quality-speed-tunable paradigm for highly usable avatar modeling. Extensive experiments show that FastAvatar has higher quality and highly competitive speed compared to existing methods.
Similar Papers
FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses
CV and Pattern Recognition
Creates realistic 3D faces from one picture.
EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors
CV and Pattern Recognition
Makes virtual faces look and move more real.
HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars
CV and Pattern Recognition
Makes 3D avatars look real and move smoothly.