Score: 1

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Published: May 1, 2025 | arXiv ID: 2505.00497v1

By: Antoni Bigata , Rodrigo Mira , Stella Bounareli and more

Potential Business Impact:

Makes videos match new voices perfectly.

Business Areas:

Motion Capture Media and Entertainment, Video

Lip synchronization, known as the task of aligning lip movements in an existing video with new input audio, is typically framed as a simpler variant of audio-driven facial animation. However, as well as suffering from the usual issues in talking head generation (e.g., temporal consistency), lip synchronization presents significant new challenges such as expression leakage from the input video and facial occlusions, which can severely impact real-world applications like automated dubbing, but are often neglected in existing works. To address these shortcomings, we present KeySync, a two-stage framework that succeeds in solving the issue of temporal consistency, while also incorporating solutions for leakage and occlusions using a carefully designed masking strategy. We show that KeySync achieves state-of-the-art results in lip reconstruction and cross-synchronization, improving visual quality and reducing expression leakage according to LipLeak, our novel leakage metric. Furthermore, we demonstrate the effectiveness of our new masking approach in handling occlusions and validate our architectural choices through several ablation studies. Code and model weights can be found at https://antonibigata.github.io/KeySync.

SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild

CV and Pattern Recognition

Makes videos speak any language perfectly.

25 Dec 2025 0

89%

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

CV and Pattern Recognition

Makes talking videos match the sound perfectly.

27 May 2025 1

88%

SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation

CV and Pattern Recognition

Makes talking avatars move their faces and bodies.

24 Jan 2025 0

View PDF Login to Bookmark

Country of Origin

🇬🇧 United Kingdom

Page Count

18 pages

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Makes videos match new voices perfectly.

Technical Abstract

SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild

OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation