Score: 3

Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems

Published: March 8, 2025 | arXiv ID: 2503.06238v1

By: Kibum Kim , Sein Kim , Hongseok Kang and more

BigTech Affiliations: Amazon

Potential Business Impact:

Shows movies using pictures, not words.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large Language Models (LLMs) have recently emerged as a powerful backbone for recommender systems. Existing LLM-based recommender systems take two different approaches for representing items in natural language, i.e., Attribute-based Representation and Description-based Representation. In this work, we aim to address the trade-off between efficiency and effectiveness that these two approaches encounter, when representing items consumed by users. Based on our interesting observation that there is a significant information overlap between images and descriptions associated with items, we propose a novel method, Image is all you need for LLM-based Recommender system (I-LLMRec). Our main idea is to leverage images as an alternative to lengthy textual descriptions for representing items, aiming at reducing token usage while preserving the rich semantic information of item descriptions. Through extensive experiments, we demonstrate that I-LLMRec outperforms existing methods in both efficiency and effectiveness by leveraging images. Moreover, a further appeal of I-LLMRec is its ability to reduce sensitivity to noise in descriptions, leading to more robust recommendations.

MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

Information Retrieval

Suggests better movies and products you'll like.

21 Aug 2025 2

90%

Bridging Collaborative Filtering and Large Language Models with Dynamic Alignment, Multimodal Fusion and Evidence-grounded Explanations

Information Retrieval

Shows you things you'll like, even if they change.

2 Oct 2025 0

90%

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Information Retrieval

Helps video apps understand what you *really* like.

13 Aug 2025 2

View PDF Login to Bookmark

Country of Origin

🇺🇸 🇰🇷 United States, Korea, Republic of

Repos / Data Links

github.com

Page Count

13 pages

Image is All You Need: Towards Efficient and Effective Large Language Model-Based Recommender Systems

Shows movies using pictures, not words.

Technical Abstract

MLLMRec: Exploring the Potential of Multimodal Large Language Models in Recommender Systems

Bridging Collaborative Filtering and Large Language Models with Dynamic Alignment, Multimodal Fusion and Evidence-grounded Explanations

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations