Multi-User Contextual Cascading Bandits for Personalized Recommendation
By: Jiho Park, Huiwen Jia
Potential Business Impact:
Shows ads better to many people at once.
We introduce a Multi-User Contextual Cascading Bandit model, a new combinatorial bandit framework that captures realistic online advertising scenarios where multiple users interact with sequentially displayed items simultaneously. Unlike classical contextual bandits, MCCB integrates three key structural elements: (i) cascading feedback based on sequential arm exposure, (ii) parallel context sessions enabling selective exploration, and (iii) heterogeneous arm-level rewards. We first propose Upper Confidence Bound with Backward Planning (UCBBP), a UCB-style algorithm tailored to this setting, and prove that it achieves a regret bound of $\widetilde{O}(\sqrt{THN})$ over $T$ episodes, $H$ session steps, and $N$ contexts per episode. Motivated by the fact that many users interact with the system simultaneously, we introduce a second algorithm, termed Active Upper Confidence Bound with Backward Planning (AUCBBP), which shows a strict efficiency improvement in context scaling, i.e., user scaling, with a regret bound of $\widetilde{O}(\sqrt{T+HN})$. We validate our theoretical findings via numerical experiments, demonstrating the empirical effectiveness of both algorithms under various settings.
Similar Papers
Multi-User Contextual Cascading Bandits for Personalized Recommendation
Machine Learning (CS)
Shows ads to people better and faster.
Decentralized Contextual Bandits with Network Adaptivity
Machine Learning (CS)
Helps many computers learn together faster.
Decentralized Contextual Bandits with Network Adaptivity
Machine Learning (CS)
Helps many computers learn together faster.