Batched Stochastic Matching Bandits
By: Jung-hun Kim, Min-hwan Oh
Potential Business Impact:
Helps match people to jobs faster.
In this study, we introduce a novel bandit framework for stochastic matching based on the Multi-nomial Logit (MNL) choice model. In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each arm stochastically selects an agent from its assigned pool according to an unknown preference and yields a corresponding reward. The objective is to minimize regret by maximizing the cumulative revenue from successful matches across all agents. This task requires solving a combinatorial optimization problem based on estimated preferences, which is NP-hard and leads a naive approach to incur a computational cost of $O(K^N)$ per round. To address this challenge, we propose batched algorithms that limit the frequency of matching updates, thereby reducing the amortized computational cost (i.e., the average cost per round) to $O(1)$ while still achieving a regret bound of $\tilde{O}(\sqrt{T})$.
Similar Papers
Achieving Limited Adaptivity for Multinomial Logistic Bandits
Machine Learning (CS)
Helps computers make better choices with fewer updates.
Bandits with Single-Peaked Preferences and Limited Resources
Machine Learning (CS)
Helps computers pick the best things for people.
Stochastic Bandits for Crowdsourcing and Multi-Platform Autobidding
CS and Game Theory
Helps spend money fairly on many tasks.