Score: 1

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Published: May 5, 2025 | arXiv ID: 2505.02579v3

By: Lingxiao Kong , Cong Yang , Susanne Neufang and more

Potential Business Impact:

Helps AI learn many things better, faster.

Business Areas:

E-Learning Education, Software

Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption ($17,529\pm 1,650$ data points and $6,573\pm 147.43$ seconds), improved scalability and explainability, and comparable performance across multiple objectives.

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

Computation and Language

Teaches AI to do many things well.

25 Sep 2025 0

91%

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

Sound

Makes AI voices show feelings and emphasis better.

7 Oct 2025 0

90%

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition

Sound

Helps computers understand emotions in voices better.

19 Sep 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com github.com

Page Count

14 pages

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Helps AI learn many things better, faster.

Technical Abstract

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS

EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition