Score: 2

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents

Published: August 4, 2025 | arXiv ID: 2508.02013v3

By: Changhao Jiang , Jiajun Sun , Yifei Cao and more

BigTech Affiliations: ByteDance

Potential Business Impact:

Makes AI talk like different people for better chats.

Recently, role-playing agents have emerged as a promising paradigm for achieving personalized interaction and emotional resonance. Existing research primarily focuses on the textual modality, neglecting the critical dimension of speech in realistic interactive scenarios. In particular, there is a lack of systematic evaluation for Speech Role-Playing Agents (SRPAs). To address this gap, we construct SpeechRole-Data, a large-scale, high-quality dataset that comprises 98 diverse roles and 112k speech-based single-turn and multi-turn conversations. Each role demonstrates distinct vocal characteristics, including timbre and prosody, thereby enabling more sophisticated speech role-playing. Furthermore, we propose SpeechRole-Eval, a multidimensional evaluation benchmark that systematically assesses SRPAs performance in key aspects such as fundamental interaction ability, speech expressiveness, and role-playing fidelity. Experimental results reveal the advantages and challenges of both cascaded and end-to-end speech role-playing agents in maintaining vocal style consistency and role coherence. We release all data, code, and baseline models to provide a solid foundation for speech-driven multimodal role-playing research and to foster further developments in this field.

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents

Computation and Language

Makes talking robots sound like real people.

4 Aug 2025 2

92%

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents

Computation and Language

Makes talking robots sound like real people.

4 Sep 2025 1

90%

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models

Sound

Makes AI voices sound like TV show characters.

27 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

27 pages

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents

Makes AI talk like different people for better chats.

Technical Abstract

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents

VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents

AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models