Score: 1

Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space

Published: March 13, 2025 | arXiv ID: 2503.10104v1

By: Yuheng Liang , Zheyu Wang , Feng Liu and more

Potential Business Impact:

Reads emotions from videos to help computers understand feelings.

Business Areas:

Image Recognition Data and Analytics, Software

Continuous Emotion Recognition (CER) plays a crucial role in intelligent human-computer interaction, mental health monitoring, and autonomous driving. Emotion modeling based on the Valence-Arousal (VA) space enables a more nuanced representation of emotional states. However, existing methods still face challenges in handling long-term dependencies and capturing complex temporal dynamics. To address these issues, this paper proposes a novel emotion recognition model, Mamba-VA, which leverages the Mamba architecture to efficiently model sequential emotional variations in video frames. First, the model employs a Masked Autoencoder (MAE) to extract deep visual features from video frames, enhancing the robustness of temporal information. Then, a Temporal Convolutional Network (TCN) is utilized for temporal modeling to capture local temporal dependencies. Subsequently, Mamba is applied for long-sequence modeling, enabling the learning of global emotional trends. Finally, a fully connected (FC) layer performs regression analysis to predict continuous valence and arousal values. Experimental results on the Valence-Arousal (VA) Estimation task of the 8th competition on Affective Behavior Analysis in-the-wild (ABAW) demonstrate that the proposed model achieves valence and arousal scores of 0.5362 (0.5036) and 0.4310 (0.4119) on the validation (test) set, respectively, outperforming the baseline. The source code is available on GitHub:https://github.com/FreedomPuppy77/Charon.

Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction

CV and Pattern Recognition

Makes computers judge faces as pretty or not.

1 Sep 2025 1

89%

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network

Machine Learning (CS)

Helps computers understand emotions from faces, voices, and words.

16 Mar 2025 1

88%

DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation

Artificial Intelligence

Helps computers understand how people feel in talks.

22 Sep 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

6 pages

Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space

Reads emotions from videos to help computers understand feelings.

Technical Abstract

Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network

DA-Mamba: Dialogue-aware selective state-space model for multimodal engagement estimation