Clus-UCB: A Near-Optimal Algorithm for Clustered Bandits
By: Aakash Gore, Prasanna Chaporkar
Potential Business Impact:
Helps computers learn faster by grouping similar choices.
We study a stochastic multi-armed bandit setting where arms are partitioned into known clusters, such that the mean rewards of arms within a cluster differ by at most a known threshold. While the clustering structure is known a priori, the arm means are unknown. We derive an asymptotic lower bound on the regret that improves upon the classical bound of Lai & Robbins (1985). We then propose Clus-UCB, an efficient algorithm that closely matches this lower bound asymptotically. Clus-UCB is designed to exploit the clustering structure and introduces a new index to evaluate an arm, which depends on other arms within the cluster. In this way, arms share information among each other. We present simulation results of our algorithm and compare its performance against KL-UCB and other wellknown algorithms for bandits with dependent arms. Finally, we address some limitations of this work and conclude by mentioning some possible future research.
Similar Papers
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts
Machine Learning (CS)
Helps computers learn faster by grouping similar users.
Online Clustering with Bandit Information
Machine Learning (CS)
Groups similar things together with fewer guesses.
Using causal abstractions to accelerate decision-making in complex bandit problems
Machine Learning (CS)
Helps computers learn faster by using different views.