LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition
By: Zhining He, Yang Xiao
Potential Business Impact:
Helps computers understand feelings in voices.
Emotion recognition in conversations (ERC) aims to predict the emotional state of each utterance by using multiple input types, such as text and audio. While Transformer-based models have shown strong performance in this task, they often face two major issues: high computational cost and heavy dependence on speaker information. These problems reduce their ability to generalize in real-world conversations. To solve these challenges, we propose LPGNet, a Lightweight network with Parallel attention and Gated fusion for multimodal ERC. The main part of LPGNet is the Lightweight Parallel Interaction Attention (LPIA) module. This module replaces traditional stacked Transformer layers with parallel dot-product attention, which can model both within-modality and between-modality relationships more efficiently. To improve emotional feature learning, LPGNet also uses a dual-gated fusion method. This method filters and combines features from different input types in a flexible and dynamic way. In addition, LPGNet removes speaker embeddings completely, which allows the model to work independently of speaker identity. Experiments on the IEMOCAP dataset show that LPGNet reaches over 87% accuracy and F1-score in 4-class emotion classification. It outperforms strong baseline models while using fewer parameters and showing better generalization across speakers.
Similar Papers
PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis
Machine Learning (CS)
Helps computers understand feelings from words, sound, and pictures.
GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints
Artificial Intelligence
Helps computers understand feelings from faces, voices, words.
Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware
Sound
Helps tiny computers understand feelings from voices.