Threshold-Based Optimal Arm Selection in Monotonic Bandits: Regret Lower Bounds and Algorithms
By: Chanakya Varude , Jay Chaudhary , Siddharth Kaushik and more
Potential Business Impact:
Finds the best option near a target.
In multi-armed bandit problems, the typical goal is to identify the arm with the highest reward. This paper explores a threshold-based bandit problem, aiming to select an arm based on its relation to a prescribed threshold \(\tau \). We study variants where the optimal arm is the first above \(\tau\), the \(k^{th}\) arm above or below it, or the closest to it, under a monotonic structure of arm means. We derive asymptotic regret lower bounds, showing dependence only on arms adjacent to \(\tau\). Motivated by applications in communication networks (CQI allocation), clinical dosing, energy management, recommendation systems, and more. We propose algorithms with optimality validated through Monte Carlo simulations. Our work extends classical bandit theory with threshold constraints for efficient decision-making.
Similar Papers
Risk-sensitive Bandits: Arm Mixture Optimality and Regret-efficient Algorithms
Machine Learning (Stat)
Helps computers choose the best option, even when risky.
Multi-thresholding Good Arm Identification with Bandit Feedback
Machine Learning (CS)
Finds the best option when there are many goals.
Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm
Machine Learning (CS)
Helps computers choose best options with many goals.