Score: 1

Ges-QA: A Multidimensional Quality Assessment Dataset for Audio-to-3D Gesture Generation

Published: August 16, 2025 | arXiv ID: 2508.12020v1

By: Zhilin Gao , Yunhao Li , Sijing Wu and more

Potential Business Impact:

Makes computer gestures match sounds better.

The Audio-to-3D-Gesture (A2G) task has enormous potential for various applications in virtual reality and computer graphics, etc. However, current evaluation metrics, such as Fr\'echet Gesture Distance or Beat Constancy, fail at reflecting the human preference of the generated 3D gestures. To cope with this problem, exploring human preference and an objective quality assessment metric for AI-generated 3D human gestures is becoming increasingly significant. In this paper, we introduce the Ges-QA dataset, which includes 1,400 samples with multidimensional scores for gesture quality and audio-gesture consistency. Moreover, we collect binary classification labels to determine whether the generated gestures match the emotions of the audio. Equipped with our Ges-QA dataset, we propose a multi-modal transformer-based neural network with 3 branches for video, audio and 3D skeleton modalities, which can score A2G contents in multiple dimensions. Comparative experimental results and ablation studies demonstrate that Ges-QAer yields state-of-the-art performance on our dataset.

Gesture Generation (Still) Needs Improved Human Evaluation Practices: Insights from a Community-Driven State-of-the-Art Benchmark

CV and Pattern Recognition

Makes computer-made gestures look more real.

3 Nov 2025 1

89%

Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark

CV and Pattern Recognition

Makes computer-made gestures look more real.

3 Nov 2025 1

89%

M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Graphics

Makes avatars move realistically from sound.

13 May 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

5 pages

Ges-QA: A Multidimensional Quality Assessment Dataset for Audio-to-3D Gesture Generation

Makes computer gestures match sounds better.

Technical Abstract

Gesture Generation (Still) Needs Improved Human Evaluation Practices: Insights from a Community-Driven State-of-the-Art Benchmark

Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark

M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis