Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem
By: Vikram Krishnamurthy
Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.
Similar Papers
Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization
Machine Learning (CS)
Makes smart guessing programs learn faster.
On Instability of Minimax Optimal Optimism-Based Bandit Algorithms
Machine Learning (Stat)
Makes smart computer choices less predictable.
Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime
Machine Learning (CS)
Makes smart choices with guaranteed safety.