Score: 0

SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant

Published: June 3, 2025 | arXiv ID: 2506.02457v1

By: Yixuan Hou , Heyang Liu , Yuhao Wang and more

Potential Business Impact:

Tests how well AI talks like a real person.

Business Areas:
Speech Recognition Data and Analytics, Software

Thanks to the steady progress of large language models (LLMs), speech encoding algorithms and vocoder structure, recent advancements have enabled generating speech response directly from a user instruction. However, benchmarking the generated speech quality has been a neglected but critical issue, considering the shift from the pursuit of semantic accuracy to vivid and spontaneous speech flow. Previous evaluation focused on the speech-understanding ability, lacking a quantification of acoustic quality. In this paper, we propose Speech cOnversational Voice Assistant Benchmark (SOVA-Bench), providing a comprehension comparison of the general knowledge, speech recognition and understanding, along with both semantic and acoustic generative ability between available speech LLMs. To the best of our knowledge, SOVA-Bench is one of the most systematic evaluation frameworks for speech LLMs, inspiring the direction of voice interaction systems.

Country of Origin
🇨🇳 China

Page Count
5 pages

Category
Computer Science:
Sound