Score: 1

Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Published: June 12, 2025 | arXiv ID: 2506.10930v1

By: Thanathai Lertpetchpun , Tiantian Feng , Dani Byrd and more

Potential Business Impact:

Helps computers understand emotions in people's voices.

Business Areas:

Speech Recognition Data and Analytics, Software

Speech emotion recognition (SER) in naturalistic conditions presents a significant challenge for the speech processing community. Challenges include disagreement in labeling among annotators and imbalanced data distributions. This paper presents a reproducible framework that achieves superior (top 1) performance in the Emotion Recognition in Naturalistic Conditions Challenge (IS25-SER Challenge) - Task 2, evaluated on the MSP-Podcast dataset. Our system is designed to tackle the aforementioned challenges through multimodal learning, multi-task learning, and imbalanced data handling. Specifically, our best system is trained by adding text embeddings, predicting gender, and including ``Other'' (O) and ``No Agreement'' (X) samples in the training set. Our system's results secured both first and second places in the IS25-SER Challenge, and the top performance was achieved by a simple two-system ensemble.

Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025

Sound

Helps computers understand emotions in spoken words.

2 Jun 2025 2

92%

Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices

Sound

Helps computers understand how people feel when they talk.

28 May 2025 1

91%

Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model

Audio and Speech Processing

Helps computers understand how people feel when they talk.

26 May 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

5 pages

Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Helps computers understand emotions in people's voices.

Technical Abstract

Enhancing Speech Emotion Recognition with Graph-Based Multimodal Fusion and Prosodic Features for the Speech Emotion Recognition in Naturalistic Conditions Challenge at Interspeech 2025

Developing a Top-tier Framework in Naturalistic Conditions Challenge for Categorized Emotion Prediction: From Speech Foundation Models and Learning Objective to Data Augmentation and Engineering Choices

Improving Speech Emotion Recognition Through Cross Modal Attention Alignment and Balanced Stacking Model