Score: 0

Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification

Published: September 15, 2025 | arXiv ID: 2509.11511v1

By: Suman Cha, Hyunjoong Kim

Potential Business Impact:

Helps computers learn from rare, important data.

Business Areas:
A/B Testing Data and Analytics

Class imbalance in supervised classification often degrades model performance by biasing predictions toward the majority class, particularly in critical applications such as medical diagnosis and fraud detection. Traditional oversampling techniques, including SMOTE and its variants, generate synthetic minority samples via local interpolation but fail to capture global data distributions in high-dimensional spaces. Deep generative models based on GANs offer richer distribution modeling yet suffer from training instability and mode collapse under severe imbalance. To overcome these limitations, we introduce an oversampling framework that learns a parametric transformation to map majority samples into the minority distribution. Our approach minimizes the maximum mean discrepancy (MMD) between transformed and true minority samples for global alignment, and incorporates a triplet loss regularizer to enforce boundary awareness by guiding synthesized samples toward challenging borderline regions. We evaluate our method on 29 synthetic and real-world datasets, demonstrating consistent improvements over classical and generative baselines in AUROC, G-mean, F1-score, and MCC. These results confirm the robustness, computational efficiency, and practical utility of the proposed framework for imbalanced classification tasks.

Page Count
19 pages

Category
Statistics:
Machine Learning (Stat)