Score: 1

FITRep: Attention-Guided Item Representation via MLLMs

Published: November 26, 2025 | arXiv ID: 2511.21389v1

By: Guoxiao Zhang , Ao Li , Tan Qu and more

BigTech Affiliations: Meituan

Potential Business Impact:

Finds and removes nearly identical online items.

Business Areas:

Semantic Search Internet Services

Online platforms usually suffer from user experience degradation due to near-duplicate items with similar visuals and text. While Multimodal Large Language Models (MLLMs) enable multimodal embedding, existing methods treat representations as black boxes, ignoring structural relationships (e.g., primary vs. auxiliary elements), leading to local structural collapse problem. To address this, inspired by Feature Integration Theory (FIT), we propose FITRep, the first attention-guided, white-box item representation framework for fine-grained item deduplication. FITRep consists of: (1) Concept Hierarchical Information Extraction (CHIE), using MLLMs to extract hierarchical semantic concepts; (2) Structure-Preserving Dimensionality Reduction (SPDR), an adaptive UMAP-based method for efficient information compression; and (3) FAISS-Based Clustering (FBC), a FAISS-based clustering that assigns each item a unique cluster id using FAISS. Deployed on Meituan's advertising system, FITRep achieves +3.60% CTR and +4.25% CPM gains in online A/B tests, demonstrating both effectiveness and real-world impact.

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

Information Retrieval

Helps online stores pick clothes that look good together.

10 Nov 2025 0

88%

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

Information Retrieval

Helps online stores pick clothes that look good together.

10 Nov 2025 0

87%

Learning Item Representations Directly from Multimodal Features for Effective Recommendation

Information Retrieval

Shows you better stuff you might like.

8 May 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

4 pages

FITRep: Attention-Guided Item Representation via MLLMs

Finds and removes nearly identical online items.

Technical Abstract

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

Learning Item Representations Directly from Multimodal Features for Effective Recommendation