Analytic theory of dropout regularization
By: Francesco Mori, Francesca Mignacco
Potential Business Impact:
Makes computer learning better by ignoring bad data.
Dropout is a regularization technique widely used in training artificial neural networks to mitigate overfitting. It consists of dynamically deactivating subsets of the network during training to promote more robust representations. Despite its widespread adoption, dropout probabilities are often selected heuristically, and theoretical explanations of its success remain sparse. Here, we analytically study dropout in two-layer neural networks trained with online stochastic gradient descent. In the high-dimensional limit, we derive a set of ordinary differential equations that fully characterize the evolution of the network during training and capture the effects of dropout. We obtain a number of exact results describing the generalization error and the optimal dropout probability at short, intermediate, and long training times. Our analysis shows that dropout reduces detrimental correlations between hidden nodes, mitigates the impact of label noise, and that the optimal dropout probability increases with the level of noise in the data. Our results are validated by extensive numerical simulations.
Similar Papers
A Combinatorial Theory of Dropout: Subnetworks, Graph Geometry, and Generalization
Machine Learning (CS)
Makes computer learning better by picking good parts.
Convergence, design and training of continuous-time dropout as a random batch method
Machine Learning (CS)
Makes computer learning faster and more accurate.
AttentionDrop: A Novel Regularization Method for Transformer Models
CV and Pattern Recognition
Makes AI smarter and more reliable.