A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes
By: Alireza F. Pour, Shai Ben-David
Potential Business Impact:
Teaches computers to learn from more data.
We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to such challenges is SRM (or regularization) based learning algorithms, we propose a novel learning paradigm that relies on stronger incorporation of empirical data and requires less algorithmic decisions to be based on prior assumptions. We analyze the generalization capabilities of our approach and demonstrate its merits in several common learning assumptions, including similarity of close points, clustering of the domain into highly label-homogeneous regions, Lipschitzness assumptions of the labeling rule, and contrastive learning assumptions. Our approach allows utilizing such assumptions without the need to know their true parameters a priori.
Similar Papers
Challenges of Heterogeneity in Big Data: A Comparative Study of Classification in Large-Scale Structured and Unstructured Domains
Machine Learning (CS)
Finds best computer learning for different data.
Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures
Machine Learning (CS)
Makes AI learn better from mixed data.
Cross-Learning from Scarce Data via Multi-Task Constrained Optimization
Machine Learning (CS)
Learns better from less information by sharing knowledge.