Score: 1

Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation

Published: October 7, 2025 | arXiv ID: 2510.05624v1

By: Nolwenn Bernard, Krisztian Balog

Potential Business Impact:

Makes chatbots recommend things better.

Business Areas:

Simulation Software

Research and development on conversational recommender systems (CRSs) critically depends on sound and reliable evaluation methodologies. However, the interactive nature of these systems poses significant challenges for automatic evaluation. This paper critically examines current evaluation practices and identifies two key limitations: the over-reliance on static test collections and the inadequacy of existing evaluation metrics. To substantiate this critique, we analyze real user interactions with nine existing CRSs and demonstrate a striking disconnect between self-reported user satisfaction and performance scores reported in prior literature. To address these limitations, this work explores the potential of user simulation to generate dynamic interaction data, offering a departure from static datasets. Furthermore, we propose novel evaluation metrics, based on a general reward/cost framework, designed to better align with real user satisfaction. Our analysis of different simulation approaches provides valuable insights into their effectiveness and reveals promising initial results, showing improved correlation with system rankings compared to human evaluation. While these findings indicate a significant step forward in CRS evaluation, we also identify areas for future research and refinement in both simulation techniques and evaluation metrics.

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Information Retrieval

Builds better chatbots that suggest things you like.

4 Dec 2025 3

90%

Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches

Information Retrieval

Improves user fun in AI chat recommenders

4 Aug 2025 0

90%

Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches

Information Retrieval

Makes chatbots better at helping you choose things.

4 Aug 2025 0

View PDF Login to Bookmark

Country of Origin

🇳🇴 🇩🇪 Germany, Norway

Page Count

11 pages

Limitations of Current Evaluation Practices for Conversational Recommender Systems and the Potential of User Simulation

Makes chatbots recommend things better.

Technical Abstract

UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches

Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches