Score: 0

Distributionally Robust Optimization with Adversarial Data Contamination

Published: July 14, 2025 | arXiv ID: 2507.10718v1

By: Shuyao Li, Ilias Diakonikolas, Jelena Diakonikolas

Potential Business Impact:

Protects computer learning from bad data and changes.

Plain English Summary

Imagine you're trying to make the best decisions for your business, but you know some of your past sales numbers might be wrong or misleading. This new method helps you make smart choices even when your information isn't perfect, protecting you from bad data and unexpected changes in customer behavior. This means you can trust your plans more, leading to better results and fewer surprises down the road.

Distributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $\epsilon$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(\sqrt{\epsilon})$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.