Online Policy Learning via a Self-Normalized Maximal Inequality
By: Samuel Girard, Aurélien Bibaut, Houssam Zenati
Potential Business Impact:
Helps computers learn better from changing information.
Adaptive experiments produce dependent data that break i.i.d. assumptions that underlie classical concentration bounds and invalidate standard learning guarantees. In this paper, we develop a self-normalized maximal inequality for martingale empirical processes. Building on this, we first propose an adaptive sample-variance penalization procedure which balances empirical loss and sample variance, valid for general dependent data. Next, this allows us to derive a new variance-regularized pessimistic off-policy learning objective, for which we establish excess-risk guarantees. Subsequently, we show that, when combined with sequential updates and under standard complexity and margin conditions, the resulting estimator achieves fast convergence rates in both parametric and nonparametric regimes, improving over the usual $1/\sqrt{n}$ baseline. We complement our theoretical findings with numerical simulations that illustrate the practical gains of our approach.
Similar Papers
Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity
Machine Learning (Stat)
Helps computers learn faster from data.
Maximal Inequalities for Independent Random Vectors
Probability
Finds better math rules for guessing unknown things.
Differentially Private Learning of Exponential Distributions: Adaptive Algorithms and Tight Bounds
Data Structures and Algorithms
Learns private data patterns without revealing secrets.