Score: 2

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization

Published: December 16, 2025 | arXiv ID: 2512.14687v1

By: Yen-Ju Lu , Kunxiao Gao , Mingrui Liang and more

BigTech Affiliations: Johns Hopkins University

Potential Business Impact:

Helps computers understand feelings in spoken words.

Business Areas:

Speech Recognition Data and Analytics, Software

Recent audio language models can follow long conversations. However, research on emotion-aware or spoken dialogue summarization is constrained by the lack of data that links speech, summaries, and paralinguistic cues. We introduce Spoken DialogSum, the first corpus aligning raw conversational audio with factual summaries, emotion-rich summaries, and utterance-level labels for speaker age, gender, and emotion. The dataset is built in two stages: first, an LLM rewrites DialogSum scripts with Switchboard-style fillers and back-channels, then tags each utterance with emotion, pitch, and speaking rate. Second, an expressive TTS engine synthesizes speech from the tagged scripts, aligned with paralinguistic labels. Spoken DialogSum comprises 13,460 emotion-diverse dialogues, each paired with both a factual and an emotion-focused summary. The dataset is available online at https://fatfat-emosum.github.io/EmoDialog-Sum-Audio-Samples/. Baselines show that an Audio-LLM raises emotional-summary ROUGE-L by 28% relative to a cascaded ASR-LLM system, confirming the value of end-to-end speech modeling.

OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue

Sound

Makes computers understand and show feelings in talking.

13 Aug 2025 1

89%

OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue

Sound

Makes computers understand and show feelings in talking.

13 Aug 2025 1

88%

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems

Computation and Language

Helps computers understand feelings in talking.

25 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

12 pages

Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization

Helps computers understand feelings in spoken words.

Technical Abstract

OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue

OSUM-EChat: Enhancing End-to-End Empathetic Spoken Chatbot via Understanding-Driven Spoken Dialogue

EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems