Learning What to Attend First: Modality-Importance-Guided Reasoning for Reliable Multimodal Emotion Understanding
By: Hyeongseop Rha , Jeong Hun Yeo , Junil Won and more
Potential Business Impact:
Helps AI understand feelings better from pictures and words.
In this paper, we present Modality-Importance-Guided Reasoning (MIGR), a framework designed to improve the reliability of reasoning-based multimodal emotion understanding in multimodal large language models. Although existing methods have advanced emotion understanding, they often suffer from reasoning drift: models gradually rely on their own generated text instead of multimodal evidence, and their explanations are overly shaped by visually initiated reasoning paths. To address these issues, we introduce Modality Importance (MI), a simple yet effective mechanism for identifying the emotion-dominant modality. Using MI, MIGR reorganizes reasoning sequences so that explanations begin from the modality most critical to the target emotion, preventing early reasoning from being misled by less informative cues. Our two-stage framework-comprising modality-aligned supervised fine-tuning and modality-aware reward optimization-encourages models to generate emotionally grounded, causally relevant, and coherence-preserving explanations. Experimental results on the DFEW benchmark show that MIGR substantially improves reasoning reliability, decreasing instances of correct predictions accompanied by emotionally inconsistent explanations from 18.10% to 7.37%. These results confirm the benefit of initiating reasoning from the emotion-dominant modality.
Similar Papers
Emotion-Driven Personalized Recommendation for AI-Generated Content Using Multi-Modal Sentiment and Intent Analysis
Information Retrieval
Recommends videos based on your feelings.
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
CV and Pattern Recognition
Helps computers understand feelings better.
Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion Recognition
Sound
Helps computers understand music's feelings better.