Embedding based retrieval for long tail search queries in ecommerce
By: Akshay Kekuda, Yuyang Zhang, Arun Udayashankar
Potential Business Impact:
Helps shoppers find rare items online.
In this abstract we present a series of optimizations we performed on the two-tower model architecture [14], and training and evaluation datasets to implement semantic product search at Best Buy. Search queries on bestbuy.com follow the pareto distribution whereby a minority of them account for most searches. This leaves us with a long tail of search queries that have low frequency of issuance. The queries in the long tail suffer from very spare interaction signals. Our current work focuses on building a model to serve the long tail queries. We present a series of optimizations we have done to this model to maximize conversion for the purpose of retrieval from the catalog. The first optimization we present is using a large language model to improve the sparsity of conversion signals. The second optimization is pretraining an off-the-shelf transformer-based model on the Best Buy catalog data. The third optimization we present is on the finetuning front. We use query-to-query pairs in addition to query-to-product pairs and combining the above strategies for finetuning the model. We also demonstrate how merging the weights of these finetuned models improves the evaluation metrics. Finally, we provide a recipe for curating an evaluation dataset for continuous monitoring of model performance with human-in-the-loop evaluation. We found that adding this recall mechanism to our current term match-based recall improved conversion by 3% in an online A/B test.
Similar Papers
Research on E-Commerce Long-Tail Product Recommendation Mechanism Based on Large-Scale Language Models
Information Retrieval
Helps online stores show you more unique items.
Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings
Information Retrieval
Finds fake product listings using words and pictures.
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Computation and Language
Helps AI understand and do rare tasks better.