Bandits with Single-Peaked Preferences and Limited Resources
By: Gur Keinan, Rotem Torkan, Omer Ben-Porat
Potential Business Impact:
Helps computers pick the best things for people.
We study an online stochastic matching problem in which an algorithm sequentially matches $U$ users to $K$ arms, aiming to maximize cumulative reward over $T$ rounds under budget constraints. Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible. To overcome this barrier, we focus on \emph{single-peaked preferences} -- a well-established structure in social choice theory, where users' preferences are unimodal with respect to a common order over arms. We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $\tilde O(UKT^{2/3})$. Our approach relies on a novel PQ tree-based order approximation method. If the single-peaked structure is known, we develop an efficient UCB-like algorithm that achieves a regret bound of $\tilde O(U\sqrt{TK})$.
Similar Papers
Batched Stochastic Matching Bandits
Machine Learning (Stat)
Helps match people to jobs faster.
Optimal Algorithms for Bandit Learning in Matching Markets
CS and Game Theory
Finds best job matches faster with less guessing.
Bandit Learning in Housing Markets
CS and Game Theory
Helps people find best homes by learning preferences.