Score: 2

MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

Published: August 21, 2025 | arXiv ID: 2508.15304v1

By: Yuzhuo Dang , Xin Zhang , Zhiqiang Pan and more

Potential Business Impact:

Suggests better movies and products you'll like.

Business Areas:

Personalization Commerce and Shopping

Multimodal recommendation typically combines the user behavioral data with the modal features of items to reveal user's preference, presenting superior performance compared to the conventional recommendations. However, existing methods still suffer from two key problems: (1) the initialization methods of user multimodal representations are either behavior-unperceived or noise-contaminated, and (2) the KNN-based item-item graph contains noisy edges with low similarities and lacks audience co-occurrence relationships. To address such issues, we propose MLLMRec, a novel MLLM-driven multimodal recommendation framework with two item-item graph refinement strategies. On the one hand, the item images are first converted into high-quality semantic descriptions using an MLLM, which are then fused with the textual metadata of items. Then, we construct a behavioral description list for each user and feed it into the MLLM to reason about the purified user preference containing interaction motivations. On the other hand, we design the threshold-controlled denoising and topology-aware enhancement strategies to refine the suboptimal item-item graph, thereby enhancing the item representation learning. Extensive experiments on three publicly available datasets demonstrate that MLLMRec achieves the state-of-the-art performance with an average improvement of 38.53% over the best baselines.

MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model

Information Retrieval

Helps computers guess what you'll like next.

24 Dec 2025 0

92%

A Survey on Large Language Models in Multimodal Recommender Systems

Information Retrieval

Helps computers suggest movies and products better.

14 May 2025 1

92%

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Information Retrieval

Helps video apps understand what you *really* like.

13 Aug 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

9 pages

MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

Suggests better movies and products you'll like.

Technical Abstract

MMSRARec: Summarization and Retrieval Augumented Sequential Recommendation Based on Multimodal Large Language Model

A Survey on Large Language Models in Multimodal Recommender Systems

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations