Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
By: Aysenur Kulunk , Berk Taskin , M. Furkan Eseoglu and more
Potential Business Impact:
Finds fake product listings using words and pictures.
In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Milvus, an optimized vector database, our system can facilitate efficient and high-precision similarity searches across extensive product catalogs exceeding 200 million items with just 100GB of system RAM consumption. Empirical evaluations demonstrate that our matching system achieves a macro-average F1 score of 0.90, outperforming third-party solutions which attain an F1 score of 0.83. Our findings show the potential of combining domain-specific adaptations with state-of-the-art machine learning techniques to mitigate duplicate listings in large-scale e-commerce environments.
Similar Papers
Semantic De-boosting in e-commerce Query Autocomplete
Information Theory
Shows shoppers better, different product ideas.
DashCLIP: Leveraging multimodal models for generating semantic embeddings for DoorDash
Information Retrieval
Helps online stores show you better stuff.
Unsupervised Document and Template Clustering using Multimodal Embeddings
Computation and Language
Groups similar papers by words, look, and layout.