Score: 0

PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control

Published: September 21, 2025 | arXiv ID: 2509.16922v1

By: Tianheng Zhu , Yinfeng Yu , Liejun Wang and more

Potential Business Impact:

Makes computer faces talk realistically with sound.

Business Areas:

Speech Recognition Data and Analytics, Software

Audio-driven talking head generation is crucial for applications in virtual reality, digital avatars, and film production. While NeRF-based methods enable high-fidelity reconstruction, they suffer from low rendering efficiency and suboptimal audio-visual synchronization. This work presents PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS). To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere. Additionally, we introduce a lightweight Multimodal Gated Fusion Module to effectively fuse audio and spatial features, thereby improving the accuracy of Gaussian deformation prediction. Extensive experiments on public datasets demonstrate that PGSTalker outperforms existing NeRF- and 3DGS-based approaches in rendering quality, lip-sync precision, and inference speed. Our method exhibits strong generalization capabilities and practical potential for real-world deployment.

EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation

Sound

Makes talking avatars move their lips to sound.

3 Oct 2025 2

91%

GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting

CV and Pattern Recognition

Creates real-time talking avatars from sound.

11 Dec 2025 1

89%

GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting

CV and Pattern Recognition

Makes one computer program talk like many people.

3 May 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

15 pages

PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control

Makes computer faces talk realistically with sound.

Technical Abstract

EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation

GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting

GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting