Score: 0

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Published: December 31, 2025 | arXiv ID: 2512.24693v1

By: Wenzhe Li , Shujian Zhang , Wenxuan Zhou and more

Evaluating the quality of multi-turn conversations is crucial for developing capable Large Language Models (LLMs), yet remains a significant challenge, often requiring costly human evaluation. Multi-turn reward models (RMs) offer a scalable alternative and can provide valuable signals for guiding LLM training. While recent work has advanced multi-turn \textit{training} techniques, effective automated \textit{evaluation} specifically for multi-turn interactions lags behind. We observe that standard preference datasets, typically contrasting responses based only on the final conversational turn, provide insufficient signal to capture the nuances of multi-turn interactions. Instead, we find that incorporating contrasts spanning \textit{multiple} turns is critical for building robust multi-turn RMs. Motivated by this finding, we propose \textbf{MU}lti-\textbf{S}tep \textbf{I}nstruction \textbf{C}ontrast (MUSIC), an unsupervised data augmentation strategy that synthesizes contrastive conversation pairs exhibiting differences across multiple turns. Leveraging MUSIC on the Skywork preference dataset, we train a multi-turn RM based on the Gemma-2-9B-Instruct model. Empirical results demonstrate that our MUSIC-augmented RM outperforms baseline methods, achieving higher alignment with judgments from advanced proprietary LLM judges on multi-turn conversations, crucially, without compromising performance on standard single-turn RM benchmarks.

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Computation and Language

Makes chatbots remember conversations better.

7 Apr 2025 1

88%

ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models

Computation and Language

Makes AI chatbots better at talking back and forth.

16 May 2025 2

88%

Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation

Information Retrieval

Helps music apps pick songs you'll love.

20 Nov 2025 0

View PDF Login to Bookmark

MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models

Technical Abstract

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models

Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation