The AudioMOS Challenge 2025
By: Wen-Chin Huang , Hui Wang , Cheng Liu and more
Potential Business Impact:
Makes computers judge fake sounds as good or bad.
This is the summary paper for the AudioMOS Challenge 2025, the very first challenge for automatic subjective quality prediction for synthetic audio. The challenge consists of three tracks. The first track aims to assess text-to-music samples in terms of overall quality and textual alignment. The second track is based on the four evaluation dimensions of Meta Audiobox Aesthetics, and the test set consists of text-to-speech, text-to-audio, and text-to-music samples. The third track focuses on synthetic speech quality assessment in different sampling rates. The challenge attracted 24 unique teams from both academia and industry, and improvements over the baselines were confirmed. The outcome of this challenge is expected to facilitate development and progress in the field of automatic evaluation for audio generation systems.
Similar Papers
Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings
Audio and Speech Processing
Rates how good computer-made sounds are.
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
Sound
Helps computers judge how good spoken words sound.
Synthetic Audio Forensics Evaluation (SAFE) Challenge
Sound
Finds fake voices in recordings.