Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms
By: Kongchang Zhou , Tingyu Zhang , Wei Chen and more
The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where an agent learns about the unknown environment through iterative interactions, or the offline setting where a policy is learned solely from logged data. However, each of these paradigms has inherent limitations: online algorithms suffer from high interaction costs and slow adaptation, while offline methods are constrained by dataset quality and lack of exploration capabilities. To address these complementary weaknesses, we propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner. Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence, while strategically incorporating online interactions to mitigate the insufficient coverage or distributional bias of the offline dataset. We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches when high-quality offline data is available, and effectively corrects the bias inherent in offline-only methods when the data is limited or misaligned. Empirical results further demonstrate the consistent advantage of our algorithm.
Similar Papers
Offline Learning for Combinatorial Multi-armed Bandits
Machine Learning (CS)
Teaches computers to make smart choices from old data.
Using causal abstractions to accelerate decision-making in complex bandit problems
Machine Learning (CS)
Helps computers learn faster by using different views.
Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring
Machine Learning (CS)
Teaches students better by giving smart hints.