Score: 2

MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

Published: August 21, 2025 | arXiv ID: 2508.15304v1

By: Yuzhuo Dang , Xin Zhang , Zhiqiang Pan and more

Potential Business Impact:

Suggests better movies and products you'll like.

Business Areas:
Personalization Commerce and Shopping

Multimodal recommendation typically combines the user behavioral data with the modal features of items to reveal user's preference, presenting superior performance compared to the conventional recommendations. However, existing methods still suffer from two key problems: (1) the initialization methods of user multimodal representations are either behavior-unperceived or noise-contaminated, and (2) the KNN-based item-item graph contains noisy edges with low similarities and lacks audience co-occurrence relationships. To address such issues, we propose MLLMRec, a novel MLLM-driven multimodal recommendation framework with two item-item graph refinement strategies. On the one hand, the item images are first converted into high-quality semantic descriptions using an MLLM, which are then fused with the textual metadata of items. Then, we construct a behavioral description list for each user and feed it into the MLLM to reason about the purified user preference containing interaction motivations. On the other hand, we design the threshold-controlled denoising and topology-aware enhancement strategies to refine the suboptimal item-item graph, thereby enhancing the item representation learning. Extensive experiments on three publicly available datasets demonstrate that MLLMRec achieves the state-of-the-art performance with an average improvement of 38.53% over the best baselines.

Country of Origin
🇨🇳 China

Repos / Data Links

Page Count
9 pages

Category
Computer Science:
Information Retrieval