Generalized Kernelized Bandits: Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds
By: Alberto Maria Metelli, Simone Drago, Marco Mussi
Potential Business Impact:
Helps computers learn faster with tricky data.
We study the regret minimization problem in the novel setting of generalized kernelized bandits (GKBs), where we optimize an unknown function $f^*$ belonging to a reproducing kernel Hilbert space (RKHS) having access to samples generated by an exponential family (EF) noise model whose mean is a non-linear function $\mu(f^*)$. This model extends both kernelized bandits (KBs) and generalized linear bandits (GLBs). We propose an optimistic algorithm, GKB-UCB, and we explain why existing self-normalized concentration inequalities do not allow to provide tight regret guarantees. For this reason, we devise a novel self-normalized Bernstein-like dimension-free inequality resorting to Freedman's inequality and a stitching argument, which represents a contribution of independent interest. Based on it, we conduct a regret analysis of GKB-UCB, deriving a regret bound of order $\widetilde{O}( \gamma_T \sqrt{T/\kappa_*})$, being $T$ the learning horizon, ${\gamma}_T$ the maximal information gain, and $\kappa_*$ a term characterizing the magnitude the reward nonlinearity. Our result matches, up to multiplicative constants and logarithmic terms, the state-of-the-art bounds for both KBs and GLBs and provides a unified view of both settings.
Similar Papers
Improved Regret Bounds for Gaussian Process Upper Confidence Bound in Bayesian Optimization
Machine Learning (CS)
Makes smart guessing programs learn faster.
Neural Logistic Bandits
Machine Learning (CS)
Helps computers learn better with fewer guesses.
No-Regret Gaussian Process Optimization of Time-Varying Functions
Machine Learning (Stat)
Finds best answers even when things change.