Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability
By: Samya Praharaj, Koulik Khamaru
Statistical inference in contextual bandits is complicated by the adaptive, non-i.i.d. nature of the data. A growing body of work has shown that classical least-squares inference may fail under adaptive sampling, and that constructing valid confidence intervals for linear functionals of the model parameter typically requires paying an unavoidable inflation of order $\sqrt{d \log T}$. This phenomenon -- often referred to as the price of adaptivity -- highlights the inherent difficulty of reliable inference under general contextual bandit policies. A key structural property that circumvents this limitation is the \emph{stability} condition of Lai and Wei, which requires the empirical feature covariance to concentrate around a deterministic limit. When stability holds, the ordinary least-squares estimator satisfies a central limit theorem, and classical Wald-type confidence intervals -- designed for i.i.d. data -- become asymptotically valid even under adaptation, \emph{without} incurring the $\sqrt{d \log T}$ price of adaptivity. In this paper, we propose and analyze a penalized EXP4 algorithm for linear contextual bandits. Our first main result shows that this procedure satisfies the Lai--Wei stability condition and therefore admits valid Wald-type confidence intervals for linear functionals. Our second result establishes that the same algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors, demonstrating that stability and statistical efficiency can coexist within a single contextual bandit method. Finally, we complement our theory with simulations illustrating the empirical normality of the resulting estimators and the sharpness of the corresponding confidence intervals.
Similar Papers
Statistical Inference for Misspecified Contextual Bandits
Statistics Theory
Makes smart computer tests more reliable.
Statistical Inference for Misspecified Contextual Bandits
Statistics Theory
Makes smart computer tests reliable and fair.
Statistical Inference under Adaptive Sampling with LinUCB
Statistics Theory
Makes computer learning more accurate and trustworthy.