The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era
By: Zhixian Zhao , Shuiyuan Wang , Guojian Li and more
Potential Business Impact:
Makes computers understand and talk like people.
Driven by the rapid advancement of Large Language Models (LLMs), particularly Audio-LLMs and Omni-models, spoken dialogue systems have evolved significantly, progressively narrowing the gap between human-machine and human-human interactions. Achieving truly ``human-like'' communication necessitates a dual capability: emotional intelligence to perceive and resonate with users' emotional states, and robust interaction mechanisms to navigate the dynamic, natural flow of conversation, such as real-time turn-taking. Therefore, we launched the first Human-like Spoken Dialogue Systems Challenge (HumDial) at ICASSP 2026 to benchmark these dual capabilities. Anchored by a sizable dataset derived from authentic human conversations, this initiative establishes a fair evaluation platform across two tracks: (1) Emotional Intelligence, targeting long-term emotion understanding and empathetic generation; and (2) Full-Duplex Interaction, systematically evaluating real-time decision-making under `` listening-while-speaking'' conditions. This paper summarizes the dataset, track configurations, and the final results.
Similar Papers
DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
Audio and Speech Processing
Makes computer voices have real conversations.
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Computation and Language
Helps voice assistants understand talking better.
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Computation and Language
Makes talking computers understand us better.