Score: 0

Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem

Published: December 20, 2025 | arXiv ID: 2512.18409v1

By: Vikram Krishnamurthy

Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.

Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization

Machine Learning (CS)

Makes smart guessing programs learn faster.

2 Jun 2025 1

89%

On Instability of Minimax Optimal Optimism-Based Bandit Algorithms

Machine Learning (Stat)

Makes smart computer choices less predictable.

24 Nov 2025 0

89%

Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime

Machine Learning (CS)

Makes smart choices with guaranteed safety.

10 Dec 2025 1

View PDF Login to Bookmark

Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem

Technical Abstract

Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization

On Instability of Minimax Optimal Optimism-Based Bandit Algorithms

Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime