Score: 2

Evaluating Recabilities of Foundation Models: A Multi-Domain, Multi-Dataset Benchmark

Published: August 29, 2025 | arXiv ID: 2508.21354v1

By: Qijiong Liu , Jieming Zhu , Yingxin Lai and more

Potential Business Impact:

Tests AI to recommend things better.

Business Areas:
A/B Testing Data and Analytics

Comprehensive evaluation of the recommendation capabilities of existing foundation models across diverse datasets and domains is essential for advancing the development of recommendation foundation models. In this study, we introduce RecBench-MD, a novel and comprehensive benchmark designed to assess the recommendation abilities of foundation models from a zero-resource, multi-dataset, and multi-domain perspective. Through extensive evaluations of 19 foundation models across 15 datasets spanning 10 diverse domains -- including e-commerce, entertainment, and social media -- we identify key characteristics of these models in recommendation tasks. Our findings suggest that in-domain fine-tuning achieves optimal performance, while cross-dataset transfer learning provides effective practical support for new recommendation scenarios. Additionally, we observe that multi-domain training significantly enhances the adaptability of foundation models. All code and data have been publicly released to facilitate future research.


Page Count
19 pages

Category
Computer Science:
Information Retrieval