Robust Learning of a Group DRO Neuron
By: Guyang Cao , Shuyao Li , Sushrut Karmalkar and more
Potential Business Impact:
Teaches computers to learn from messy, changing information.
We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.
Similar Papers
Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty
Machine Learning (CS)
Makes AI fair for everyone, even small groups.
Distributionally Robust Optimization with Adversarial Data Contamination
Machine Learning (CS)
Protects computer learning from bad data and changes.
Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets
Machine Learning (CS)
Makes AI work better when data changes.