Near Optimal Best Arm Identification for Clustered Bandits
By: Yash, Nikhil Karamchandani, Avishek Ghosh
Potential Business Impact:
Helps many robots learn the best action faster.
This work investigates the problem of best arm identification for multi-agent multi-armed bandits. We consider $N$ agents grouped into $M$ clusters, where each cluster solves a stochastic bandit problem. The mapping between agents and bandits is a priori unknown. Each bandit is associated with $K$ arms, and the goal is to identify the best arm for each agent under a $\delta$-probably correct ($\delta$-PC) framework, while minimizing sample complexity and communication overhead. We propose two novel algorithms: Clustering then Best Arm Identification (Cl-BAI) and Best Arm Identification then Clustering (BAI-Cl). Cl-BAI uses a two-phase approach that first clusters agents based on the bandit problems they are learning, followed by identifying the best arm for each cluster. BAI-Cl reverses the sequence by identifying the best arms first and then clustering agents accordingly. Both algorithms leverage the successive elimination framework to ensure computational efficiency and high accuracy. We establish $\delta$-PC guarantees for both methods, derive bounds on their sample complexity, and provide a lower bound for this problem class. Moreover, when $M$ is small (a constant), we show that the sample complexity of a variant of BAI-Cl is minimax optimal in an order-wise sense. Experiments on synthetic and real-world datasets (MovieLens, Yelp) demonstrate the superior performance of the proposed algorithms in terms of sample and communication efficiency, particularly in settings where $M \ll N$.
Similar Papers
Optimal Best Arm Identification under Differential Privacy
Machine Learning (Stat)
Protects private data while finding the best option.
Constrained Best Arm Identification with Tests for Feasibility
Machine Learning (CS)
Finds the best option that also meets safety rules.
On the Problem of Best Arm Retention
Machine Learning (CS)
Keeps the best choices from a changing list.