Score: 1

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

Published: November 26, 2025 | arXiv ID: 2511.20972v1

By: Jionghao Han , Jiatong Shi , Masao Someki and more

Potential Business Impact:

Makes computer characters sing their answers.

Business Areas:

Speech Recognition Data and Analytics, Software

With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension. Demo: https://huggingface.co/spaces/espnet/SingingSDS. Code: https://github.com/SingingSDS/SingingSDS.

EmoNews: A Spoken Dialogue System for Expressive News Conversations

Computation and Language

Makes talking computers sound more caring.

16 Jun 2025 1

87%

DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment

Sound

Makes AI sing songs with real-sounding voices.

10 Oct 2025 0

87%

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance

Sound

Makes computers sing any song with any words.

4 Dec 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com github.com github.com github.com github.com github.com github.com

Page Count

21 pages

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

Makes computer characters sing their answers.

Technical Abstract

EmoNews: A Spoken Dialogue System for Expressive News Conversations

DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment

YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance