Score: 1

A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

Published: January 8, 2026 | arXiv ID: 2601.04960v1

By: Qing Wang , Zehan Li , Yaodong Song and more

Potential Business Impact:

Helps computers understand and respond to feelings.

Business Areas:
Natural Language Processing Artificial Intelligence, Data and Analytics, Software

This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.

Repos / Data Links

Page Count
3 pages

Category
Computer Science:
Computation and Language