From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models
By: Yuxuan Chen, Haoyuan Yu
Potential Business Impact:
Lets AI talk and listen at the same time.
True Full-Duplex (TFD) voice communication--enabling simultaneous listening and speaking with natural turn-taking, overlapping speech, and interruptions--represents a critical milestone toward human-like AI interaction. This survey comprehensively reviews Full-Duplex Spoken Language Models (FD-SLMs) in the LLM era. We establish a taxonomy distinguishing Engineered Synchronization (modular architectures) from Learned Synchronization (end-to-end architectures), and unify fragmented evaluation approaches into a framework encompassing Temporal Dynamics, Behavioral Arbitration, Semantic Coherence, and Acoustic Performance. Through comparative analysis of mainstream FD-SLMs, we identify fundamental challenges: synchronous data scarcity, architectural divergence, and evaluation gaps, providing a roadmap for advancing human-AI communication.
Similar Papers
Think Before You Talk: Enhancing Meaningful Dialogue Generation in Full-Duplex Speech Language Models with Planning-Inspired Text Guidance
Computation and Language
Makes talking computers understand interruptions better.
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Computation and Language
Lets AI talk and listen at the same time.
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
Sound
Lets computers talk and listen at once.