Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech
By: Shuchang Pan , Siddharth Banerjee , Dhruv Hebbar and more
Potential Business Impact:
Helps computers understand conversations like people.
Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this causal pathway is key to building natural full-duplex interactive systems. We introduce a framework that enables reasoning over conversational behaviors by modeling this process as causal inference within a Graph-of-Thoughts (GoT). Our approach formalizes the intent-to-action pathway with a hierarchical labeling scheme, predicting high-level communicative intents and low-level speech acts to learn their causal and temporal dependencies. To train this system, we develop a hybrid corpus that pairs controllable, event-rich simulations with human-annotated rationales and real conversational speech. The GoT framework structures streaming predictions as an evolving graph, enabling a multimodal transformer to forecast the next speech act, generate concise justifications for its decisions, and dynamically refine its reasoning. Experiments on both synthetic and real duplex dialogues show that the framework delivers robust behavior detection, produces interpretable reasoning chains, and establishes a foundation for benchmarking conversational reasoning in full duplex spoken dialogue systems.
Similar Papers
Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination
Multiagent Systems
Makes AI teams work together better and faster.
CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate
Machine Learning (CS)
Computers argue to find the best cause and effect.
Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech
Computation and Language
Makes talking computers think and speak better.