Score: 1

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Published: September 19, 2025 | arXiv ID: 2509.15654v2

By: Pengcheng Li , Botao Zhao , Zuheng Kang and more

Potential Business Impact:

Helps computers understand emotions in voices better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Although Large Audio-Language Models (LALMs) have exhibited outstanding performance in auditory understanding, their performance in affective computing scenarios, particularly in emotion recognition, reasoning, and subtle sentiment differentiation, remains suboptimal. Recent advances in Reinforcement Learning (RL) have shown promise in improving LALMs' reasoning abilities. However, two critical challenges hinder the direct application of RL techniques to Speech Emotion Recognition (SER) tasks: (1) convergence instability caused by ambiguous emotional boundaries and (2) limited reasoning ability when using relatively small models (e.g., 7B-parameter architectures). To overcome these limitations, we introduce EMO-RL, a novel framework incorporating reinforcement learning with two key innovations: Emotion Similarity-Weighted Reward (ESWR) and Explicit Structured Reasoning (ESR). Built upon pretrained LALMs, our method employs group-relative policy optimization with emotion constraints. Comprehensive experiments demonstrate that our EMO-RL training strategies can significantly enhance the emotional reasoning capabilities of LALMs, attaining state-of-the-art results on both the MELD and IEMOCAP datasets, and cross-dataset experiments prove the strong superiority of generalization.

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

Sound

Makes AI voices show feelings and emphasis better.

7 Oct 2025 0

90%

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Computation and Language

Helps AI learn many things better, faster.

5 May 2025 1

90%

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs

Computation and Language

Helps computers understand feelings in voices.

7 Jun 2025 0

View PDF Login to Bookmark

Page Count

11 pages

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Helps computers understand emotions in voices better.

Technical Abstract

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs