Score: 1

IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

Published: November 27, 2025 | arXiv ID: 2511.22167v1

By: Bo Chen , Tao Liu , Qi Chen and more

Potential Business Impact:

Makes faces talk realistically from pictures.

Business Areas:

Motion Capture Media and Entertainment, Video

Talking face generation aims to synthesize realistic speaking portraits from a single image, yet existing methods often rely on explicit optical flow and local warping, which fail to model complex global motions and cause identity drift. We present IMTalker, a novel framework that achieves efficient and high-fidelity talking face generation through implicit motion transfer. The core idea is to replace traditional flow-based warping with a cross-attention mechanism that implicitly models motion discrepancy and identity alignment within a unified latent space, enabling robust global motion rendering. To further preserve speaker identity during cross-identity reenactment, we introduce an identity-adaptive module that projects motion latents into personalized spaces, ensuring clear disentanglement between motion and identity. In addition, a lightweight flow-matching motion generator produces vivid and controllable implicit motion vectors from audio, pose, and gaze cues. Extensive experiments demonstrate that IMTalker surpasses prior methods in motion accuracy, identity preservation, and audio-lip synchronization, achieving state-of-the-art quality with superior efficiency, operating at 40 FPS for video-driven and 42 FPS for audio-driven generation on an RTX 4090 GPU. We will release our code and pre-trained models to facilitate applications and future research.

HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis

CV and Pattern Recognition

Makes computer faces talk smoothly and clearly.

14 Aug 2025 1

91%

MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

CV and Pattern Recognition

Makes videos of people talking from one picture.

26 Oct 2025 1

91%

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

CV and Pattern Recognition

Makes still pictures talk and move like real people.

7 Apr 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

Makes faces talk realistically from pictures.

Technical Abstract

HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis

MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis