MUSS: Multilevel Subset Selection for Relevance and Diversity
By: Vu Nguyen, Andrey Kan
Potential Business Impact:
Finds the best mix of items, much faster.
The problem of relevant and diverse subset selection has a wide range of applications, including recommender systems and retrieval-augmented generation (RAG). For example, in recommender systems, one is interested in selecting relevant items, while providing a diversified recommendation. Constrained subset selection problem is NP-hard, and popular approaches such as Maximum Marginal Relevance (MMR) are based on greedy selection. Many real-world applications involve large data, but the original MMR work did not consider distributed selection. This limitation was later addressed by a method called DGDS which allows for a distributed setting using random data partitioning. Here, we exploit structure in the data to further improve both scalability and performance on the target application. We propose MUSS, a novel method that uses a multilevel approach to relevant and diverse selection. In a recommender system application, our method can not only improve the performance up to $4$ percent points in precision, but is also $20$ to $80$ times faster. Our method is also capable of outperforming baselines on RAG-based question answering accuracy. We present a novel theoretical approach for analyzing this type of problems, and show that our method achieves a constant factor approximation of the optimal objective. Moreover, our analysis also resulted in a $\times 2$ tighter bound for DGDS compared to previously known bound.
Similar Papers
Multi-Selection for Recommendation Systems
Machine Learning (CS)
Keeps your movie picks private, still good.
Additive Distributionally Robust Ranking and Selection
Machine Learning (Stat)
Finds the best option even with uncertain information.
Considering Length Diversity in Retrieval-Augmented Summarization
Computation and Language
Makes AI summaries shorter and faster.