Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
By: Canzhe Zhao, Shinji Ito, Shuai Li
Potential Business Impact:
Helps computers learn better with tricky, unpredictable data.
Heavy-tailed bandits have been extensively studied since the seminal work of \citet{Bubeck2012BanditsWH}. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention \citep{ShaoYKL18,XueWWZ20,ZhongHYW21,Wang2025heavy,tajdini2025improved}. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) \citep{Huang0H22,ChengZ024,Chen2024uniINF}. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an $\widetilde{O}(T^{\frac{1}{\varepsilon}})$ worst-case regret in the adversarial regime as well as an $\widetilde{O}(\log T)$ gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{\varepsilon}})$ regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed \textit{heavy-tailed noise aware stability-penalty matching} (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.
Similar Papers
Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems
Machine Learning (Stat)
Makes smart computer choices better for tricky problems.
Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality
Machine Learning (Stat)
Learns faster by trying and using options.
Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback
Machine Learning (CS)
Teaches computers to learn better from mistakes.