Batched Nonparametric Bandits via k-Nearest Neighbor UCB
By: Sakshi Arya
Potential Business Impact:
Helps computers learn best choices with limited feedback.
We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves minimax-optimal rates. Empirical evaluations on synthetic and real-world datasets demonstrate that BaNk-UCB consistently outperforms binning-based baselines.
Similar Papers
Quantum-Enhanced Neural Contextual Bandit Algorithms
Machine Learning (CS)
Makes quantum computers learn faster with less data.
Decentralized Contextual Bandits with Network Adaptivity
Machine Learning (CS)
Helps many computers learn together faster.
Decentralized Contextual Bandits with Network Adaptivity
Machine Learning (CS)
Helps many computers learn together faster.