Score: 0

LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition

Published: August 12, 2025 | arXiv ID: 2508.08925v1

By: Zhining He, Yang Xiao

Potential Business Impact:

Helps computers understand feelings in voices.

Emotion recognition in conversations (ERC) aims to predict the emotional state of each utterance by using multiple input types, such as text and audio. While Transformer-based models have shown strong performance in this task, they often face two major issues: high computational cost and heavy dependence on speaker information. These problems reduce their ability to generalize in real-world conversations. To solve these challenges, we propose LPGNet, a Lightweight network with Parallel attention and Gated fusion for multimodal ERC. The main part of LPGNet is the Lightweight Parallel Interaction Attention (LPIA) module. This module replaces traditional stacked Transformer layers with parallel dot-product attention, which can model both within-modality and between-modality relationships more efficiently. To improve emotional feature learning, LPGNet also uses a dual-gated fusion method. This method filters and combines features from different input types in a flexible and dynamic way. In addition, LPGNet removes speaker embeddings completely, which allows the model to work independently of speaker identity. Experiments on the IEMOCAP dataset show that LPGNet reaches over 87% accuracy and F1-score in 4-class emotion classification. It outperforms strong baseline models while using fewer parameters and showing better generalization across speakers.

PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis

Machine Learning (CS)

Helps computers understand feelings from words, sound, and pictures.

20 Aug 2025 1

86%

GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints

Artificial Intelligence

Helps computers understand feelings from faces, voices, words.

1 Jun 2025 1

86%

Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware

Sound

Helps tiny computers understand feelings from voices.

20 Oct 2025 0

View PDF Login to Bookmark

Country of Origin

🇦🇺 Australia

Page Count

19 pages

LPGNet: A Lightweight Network with Parallel Attention and Gated Fusion for Multimodal Emotion Recognition

Helps computers understand feelings in voices.

Technical Abstract

PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis

GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints

Transformer Redesign for Late Fusion of Audio-Text Features on Ultra-Low-Power Edge Hardware