SCaLE: Switching Cost aware Learning and Exploration
By: Neelkamal Bhuyan, Debankur Mukherjee, Adam Wierman
This work addresses the fundamental problem of unbounded metric movement costs in bandit online convex optimization, by considering high-dimensional dynamic quadratic hitting costs and $\ell_2$-norm switching costs in a noisy bandit feedback model. For a general class of stochastic environments, we provide the first algorithm SCaLE that provably achieves a distribution-agnostic sub-linear dynamic regret, without the knowledge of hitting cost structure. En-route, we present a novel spectral regret analysis that separately quantifies eigenvalue-error driven regret and eigenbasis-perturbation driven regret. Extensive numerical experiments, against online-learning baselines, corroborate our claims, and highlight statistical consistency of our algorithm.
Similar Papers
Stagewise Reinforcement Learning and the Geometry of the Regret Landscape
Machine Learning (CS)
Teaches computers to learn better and faster.
High-Dimensional Linear Bandits under Stochastic Latent Heterogeneity
Machine Learning (CS)
Helps computers guess what people want better.
Distributionally Robust Online Markov Game with Linear Function Approximation
Machine Learning (Stat)
Helps robots learn real-world tasks from practice.