Score: 0

Classification Imbalance as Transfer Learning

Published: January 15, 2026 | arXiv ID: 2601.10630v1

By: Eric Xia, Jason M. Klusowski

Potential Business Impact:

Makes computer learning better with uneven data.

Business Areas:

Machine Learning Artificial Intelligence, Data and Analytics, Software

Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target distribution under which performance is evaluated. Within this framework, we study a family of oversampling procedures that augment the training data by generating synthetic samples from an estimated minority-class distribution to roughly balance the classes, among which the celebrated SMOTE algorithm is a canonical example. We show that the excess risk decomposes into the rate achievable under balanced training (as if the data had been drawn from the balanced target distribution) and an additional term, the cost of transfer, which quantifies the discrepancy between the estimated and true minority-class distributions. In particular, we show that the cost of transfer for SMOTE dominates that of bootstrapping (random oversampling) in moderately high dimensions, suggesting that we should expect bootstrapping to have better performance than SMOTE in general. We corroborate these findings with experimental evidence. More broadly, our results provide guidance for choosing among augmentation strategies for imbalanced classification.

Concentration and excess risk bounds for imbalanced classification with synthetic oversampling

Machine Learning (Stat)

Helps computers learn better from unfair data.

23 Oct 2025 0

89%

Large Language Models for Imbalanced Classification: Diversity makes the difference

Machine Learning (CS)

Makes computer learning better with more varied examples.

10 Oct 2025 1

89%

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data

Machine Learning (CS)

Teaches computers to learn from unfair data.

14 Feb 2025 1

View PDF Login to Bookmark

Page Count

61 pages

Classification Imbalance as Transfer Learning

Makes computer learning better with uneven data.

Technical Abstract

Concentration and excess risk bounds for imbalanced classification with synthetic oversampling

Large Language Models for Imbalanced Classification: Diversity makes the difference

Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data