Semi-Parametric Batched Global Multi-Armed Bandits with Covariates
By: Sakshi Arya, Hyebin Song
Potential Business Impact:
Helps computers learn better from grouped information.
The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. Moreover, in many practical applications, such as personalized medicine and recommendation systems, feedback is provided in batches, contextual information is available at the time of decision-making, and rewards from different arms are related rather than independent. We propose a novel semi-parametric framework for batched bandits with covariates and a shared parameter across arms, leveraging the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2024batched}.
Similar Papers
Robust Batched Bandits
Machine Learning (CS)
Finds best treatments faster, even with messy results.
Locally Private Nonparametric Contextual Multi-armed Bandits
Machine Learning (Stat)
Keeps private data safe while making smart choices.
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits
Machine Learning (CS)
Fairly shares rewards, making systems work better.