Constrained Linear Thompson Sampling
By: Aditya Gangrade, Venkatesh Saligrama
Potential Business Impact:
Helps computers learn safely and faster.
We study safe linear bandits (SLBs), where an agent selects actions from a convex set to maximize an unknown linear objective subject to unknown linear constraints in each round. Existing methods for SLBs provide strong regret guarantees, but require solving expensive optimization problems (e.g., second-order cones, NP hard programs). To address this, we propose Constrained Linear Thompson Sampling (COLTS), a sampling-based framework that selects actions by solving perturbed linear programs, which significantly reduces computational costs while matching the regret and risk of prior methods. We develop two main variants: S-COLTS, which ensures zero risk and $\widetilde{O}(\sqrt{d^3 T})$ regret given a safe action, and R-COLTS, which achieves $\widetilde{O}(\sqrt{d^3 T})$ regret and risk with no instance information. In simulations, these methods match or outperform state of the art SLB approaches while substantially improving scalability. On the technical front, we introduce a novel coupled noise design that ensures frequent `local optimism' about the true optimum, and a scaling-based analysis to handle the per-round variability of constraints.
Similar Papers
Multi-Agent Stage-wise Conservative Linear Bandits
Machine Learning (CS)
Helps many AI agents learn safely together.
Thompson Sampling for Multi-Objective Linear Contextual Bandit
Machine Learning (Stat)
Helps computers make better choices with many goals.
Adaptive Data Augmentation for Thompson Sampling
Machine Learning (Stat)
Learns the best choices faster for rewards.