Universal rates of ERM for agnostic learning
By: Steve Hanneke, Mingyue Xu
Potential Business Impact:
Makes computer learning faster for any data.
The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical Risk Minimization (ERM) principle being fundamental in the PAC theory and ubiquitous in practical machine learning, the recent work of arXiv:2412.02810 studied the universal rates of ERM for binary classification under the realizable setting. However, the assumption of realizability is too restrictive to hold in practice. Indeed, the majority of the literature on universal learning has focused on the realizable case, leaving the non-realizable case barely explored. In this paper, we consider the problem of universal learning by ERM for binary classification under the agnostic setting, where the ''learning curve" reflects the decay of the excess risk as the sample size increases. We explore the possibilities of agnostic universal rates and reveal a compact trichotomy: there are three possible agnostic universal rates of ERM, being either $e^{-n}$, $o(n^{-1/2})$, or arbitrarily slow. We provide a complete characterization of which concept classes fall into each of these categories. Moreover, we also establish complete characterizations for the target-dependent universal rates as well as the Bayes-dependent universal rates.
Similar Papers
Tradeoffs between Mistakes and ERM Oracle Calls in Online and Transductive Online Learning
Machine Learning (CS)
Teaches computers to learn with less information.
Rademacher learning rates for iterated random functions
Machine Learning (Stat)
Helps computers learn from data that changes over time.
Learning Causality for Modern Machine Learning
Machine Learning (CS)
Teaches AI to learn from changes, not just patterns.