Score: 2

Learning Where to Learn: Training Distribution Selection for Provable OOD Performance

Published: May 27, 2025 | arXiv ID: 2505.21626v1

By: Nicolas Guerra, Nicholas H. Nelsen, Yunan Yang

BigTech Affiliations: Massachusetts Institute of Technology

Potential Business Impact:

Teaches computers to work with new, different data.

Business Areas:

A/B Testing Data and Analytics

Out-of-distribution (OOD) generalization remains a fundamental challenge in machine learning. Models trained on one data distribution often experience substantial performance degradation when evaluated on shifted or unseen domains. To address this challenge, the present paper studies the design of training data distributions that maximize average-case OOD performance. First, a theoretical analysis establishes a family of generalization bounds that quantify how the choice of training distribution influences OOD error across a predefined family of target distributions. These insights motivate the introduction of two complementary algorithmic strategies: (i) directly formulating OOD risk minimization as a bilevel optimization problem over the space of probability measures and (ii) minimizing a theoretical upper bound on OOD error. Last, the paper evaluates the two approaches across a range of function approximation and operator learning examples. The proposed methods significantly improve OOD accuracy over standard empirical risk minimization with a fixed distribution. These results highlight the potential of distribution-aware training as a principled and practical framework for robust OOD generalization.