Bayesian Double Descent
By: Nick Polson, Vadim Sokolov
Potential Business Impact:
Makes computer learning better when it's very complex.
Double descent is a phenomenon of over-parameterized statistical models. Our goal is to view double descent from a Bayesian perspective. Over-parameterized models such as deep neural networks have an interesting re-descending property in their risk characteristics. This is a recent phenomenon in machine learning and has been the subject of many studies. As the complexity of the model increases, there is a U-shaped region corresponding to the traditional bias-variance trade-off, but then as the number of parameters equals the number of observations and the model becomes one of interpolation, the risk can become infinite and then, in the over-parameterized region, it re-descends -- the double descent effect. We show that this has a natural Bayesian interpretation. Moreover, we show that it is not in conflict with the traditional Occam's razor that Bayesian models possess, in that they tend to prefer simpler models when possible. We illustrate the approach with an example of Bayesian model selection in neural networks. Finally, we conclude with directions for future research.
Similar Papers
Bayesian Double Descent
Machine Learning (Stat)
Helps computers learn better even when too complex.
Understanding Overparametrization in Survival Models through Double-Descent
Machine Learning (Stat)
Helps computers predict life expectancy more accurately.
The Double Descent Behavior in Two Layer Neural Network for Binary Classification
Machine Learning (Stat)
Finds a sweet spot for computer learning accuracy.