Score: 1

KeyframeFace: From Text to Expressive Facial Keyframes

Published: December 12, 2025 | arXiv ID: 2512.11321v1

By: Jingchao Wu , Zejian Kang , Haibo Liu and more

Potential Business Impact:

Makes computer faces show emotions from words.

Business Areas:

Motion Capture Media and Entertainment, Video

Generating dynamic 3D facial animation from natural language requires understanding both temporally structured semantics and fine-grained expression changes. Existing datasets and methods mainly focus on speech-driven animation or unstructured expression sequences and therefore lack the semantic grounding and temporal structures needed for expressive human performance generation. In this work, we introduce KeyframeFace, a large-scale multimodal dataset designed for text-to-animation research through keyframe-level supervision. KeyframeFace provides 2,100 expressive scripts paired with monocular videos, per-frame ARKit coefficients, contextual backgrounds, complex emotions, manually defined keyframes, and multi-perspective annotations based on ARKit coefficients and images via Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). Beyond the dataset, we propose the first text-to-animation framework that explicitly leverages LLM priors for interpretable facial motion synthesis. This design aligns the semantic understanding capabilities of LLMs with the interpretable structure of ARKit's coefficients, enabling high-fidelity expressive animation. KeyframeFace and our LLM-based framework together establish a new foundation for interpretable, keyframe-guided, and context-aware text-to-animation. Code and data are available at https://github.com/wjc12345123/KeyframeFace.

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

CV and Pattern Recognition

Makes talking cartoon faces stay real for a long time.

3 Mar 2025 1

90%

Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark

Graphics

Makes computer characters show feelings from words.

17 Aug 2025 0

88%

A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques

CV and Pattern Recognition

Makes digital people move and talk realistically.

9 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

17 pages

KeyframeFace: From Text to Expressive Facial Keyframes

Makes computer faces show emotions from words.

Technical Abstract

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark

A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques