SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
By: Yixuan Hou , Heyang Liu , Yuhao Wang and more
Potential Business Impact:
Tests how well AI talks like a real person.
Thanks to the steady progress of large language models (LLMs), speech encoding algorithms and vocoder structure, recent advancements have enabled generating speech response directly from a user instruction. However, benchmarking the generated speech quality has been a neglected but critical issue, considering the shift from the pursuit of semantic accuracy to vivid and spontaneous speech flow. Previous evaluation focused on the speech-understanding ability, lacking a quantification of acoustic quality. In this paper, we propose Speech cOnversational Voice Assistant Benchmark (SOVA-Bench), providing a comprehension comparison of the general knowledge, speech recognition and understanding, along with both semantic and acoustic generative ability between available speech LLMs. To the best of our knowledge, SOVA-Bench is one of the most systematic evaluation frameworks for speech LLMs, inspiring the direction of voice interaction systems.
Similar Papers
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
Sound
Tests how well AI understands spoken Chinese.
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
Computation and Language
Tests AI assistants on hearing, talking, and seeing.
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations
Computation and Language
Tests how well AI understands people talking.