Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification
By: Suman Cha, Hyunjoong Kim
Potential Business Impact:
Helps computers learn from rare, important data.
Class imbalance in supervised classification often degrades model performance by biasing predictions toward the majority class, particularly in critical applications such as medical diagnosis and fraud detection. Traditional oversampling techniques, including SMOTE and its variants, generate synthetic minority samples via local interpolation but fail to capture global data distributions in high-dimensional spaces. Deep generative models based on GANs offer richer distribution modeling yet suffer from training instability and mode collapse under severe imbalance. To overcome these limitations, we introduce an oversampling framework that learns a parametric transformation to map majority samples into the minority distribution. Our approach minimizes the maximum mean discrepancy (MMD) between transformed and true minority samples for global alignment, and incorporates a triplet loss regularizer to enforce boundary awareness by guiding synthesized samples toward challenging borderline regions. We evaluate our method on 29 synthetic and real-world datasets, demonstrating consistent improvements over classical and generative baselines in AUROC, G-mean, F1-score, and MCC. These results confirm the robustness, computational efficiency, and practical utility of the proposed framework for imbalanced classification tasks.
Similar Papers
Large Language Models for Imbalanced Classification: Diversity makes the difference
Machine Learning (CS)
Makes computer learning better with more varied examples.
Regression Augmentation With Data-Driven Segmentation
Machine Learning (CS)
Makes AI predict rare cases accurately
Quantifying Multimodal Imbalance: A GMM-Guided Adaptive Loss for Audio-Visual Learning
Machine Learning (CS)
Helps computers understand mixed-up sounds and pictures better.