Robust Batched Bandits
By: Yunwen Guo , Yunlun Shu , Gongyi Zhuo and more
Potential Business Impact:
Finds best treatments faster, even with messy results.
The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.
Similar Papers
Semi-Parametric Batched Global Multi-Armed Bandits with Covariates
Machine Learning (Stat)
Helps computers learn better from grouped information.
Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
Machine Learning (CS)
Helps computers learn better with tricky, unpredictable data.
Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics
Machine Learning (CS)
Helps many computers work together better.