Score: 2

MUSS: Multilevel Subset Selection for Relevance and Diversity

Published: March 14, 2025 | arXiv ID: 2503.11126v2

By: Vu Nguyen, Andrey Kan

BigTech Affiliations: Amazon

Potential Business Impact:

Finds the best mix of items, much faster.

Business Areas:
Multi-level Marketing Sales and Marketing

The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. In a recommender system application, our method can not only improve the performance up to $4$ percent points in precision, but is also $20$ to $80$ times faster. Our method is also capable of outperforming baselines on RAG-based question answering accuracy. We present a novel theoretical approach for analyzing this type of problems, and show that our method achieves a constant factor approximation of the optimal objective. Moreover, our analysis also resulted in a $\times 2$ tighter bound for DGDS compared to previously known bound.

Country of Origin
🇺🇸 United States

Repos / Data Links

Page Count
24 pages

Category
Computer Science:
Machine Learning (CS)