Score: 0

Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms

Published: December 26, 2025 | arXiv ID: 2512.21925v1

By: Kongchang Zhou , Tingyu Zhang , Wei Chen and more

The problem of combinatorial multi-armed bandits with probabilistically triggered arms (CMAB-T) has been extensively studied. Prior work primarily focuses on either the online setting where an agent learns about the unknown environment through iterative interactions, or the offline setting where a policy is learned solely from logged data. However, each of these paradigms has inherent limitations: online algorithms suffer from high interaction costs and slow adaptation, while offline methods are constrained by dataset quality and lack of exploration capabilities. To address these complementary weaknesses, we propose hybrid CMAB-T, a new framework that integrates offline data with online interaction in a principled manner. Our proposed hybrid CUCB algorithm leverages offline data to guide exploration and accelerate convergence, while strategically incorporating online interactions to mitigate the insufficient coverage or distributional bias of the offline dataset. We provide theoretical guarantees on the algorithm's regret, demonstrating that hybrid CUCB significantly outperforms purely online approaches when high-quality offline data is available, and effectively corrects the bias inherent in offline-only methods when the data is limited or misaligned. Empirical results further demonstrate the consistent advantage of our algorithm.

Offline Learning for Combinatorial Multi-armed Bandits

Machine Learning (CS)

Teaches computers to make smart choices from old data.

31 Jan 2025 2

89%

Using causal abstractions to accelerate decision-making in complex bandit problems

Machine Learning (CS)

Helps computers learn faster by using different views.

4 Sep 2025 1

89%

Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring

Machine Learning (CS)

Teaches students better by giving smart hints.

1 Aug 2025 0

View PDF Login to Bookmark

Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms

Technical Abstract

Offline Learning for Combinatorial Multi-armed Bandits

Using causal abstractions to accelerate decision-making in complex bandit problems

Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring