DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue
By: Xiang Li , Duyi Pan , Hongru Xiao and more
Potential Business Impact:
Creates more natural-sounding computer voices for talking.
Speech synthesis is crucial for human-computer interaction, enabling natural and intuitive communication. However, existing datasets involve high construction costs due to manual annotation and suffer from limited character diversity, contextual scenarios, and emotional expressiveness. To address these issues, we propose DialogueAgents, a novel hybrid agent-based speech synthesis framework, which integrates three specialized agents -- a script writer, a speech synthesizer, and a dialogue critic -- to collaboratively generate dialogues. Grounded in a diverse character pool, the framework iteratively refines dialogue scripts and synthesizes speech based on speech review, boosting emotional expressiveness and paralinguistic features of the synthesized dialogues. Using DialogueAgent, we contribute MultiTalk, a bilingual, multi-party, multi-turn speech dialogue dataset covering diverse topics. Extensive experiments demonstrate the effectiveness of our framework and the high quality of the MultiTalk dataset. We release the dataset and code https://github.com/uirlx/DialogueAgents to facilitate future research on advanced speech synthesis models and customized data generation.
Similar Papers
Social Agent: Mastering Dyadic Nonverbal Behavior Generation via Conversational LLM Agents
Graphics
Makes computer characters act and move naturally together.
PodAgent: A Comprehensive Framework for Podcast Generation
Sound
Creates realistic podcasts with smart voices.
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
Computation and Language
Makes AI better at convincing people to buy things.