Training Flow Matching Models with Reliable Labels via Self-Purification
By: Hyeongju Kim , Yechan Yu , June Young Yi and more
Potential Business Impact:
Cleans messy data so computers learn better.
Training datasets are inherently imperfect, often containing mislabeled samples due to human annotation errors, limitations of tagging models, and other sources of noise. Such label contamination can significantly degrade the performance of a trained model. In this work, we introduce Self-Purifying Flow Matching (SPFM), a principled approach to filtering unreliable data within the flow-matching framework. SPFM identifies suspicious data using the model itself during the training process, bypassing the need for pretrained models or additional modules. Our experiments demonstrate that models trained with SPFM generate samples that accurately adhere to the specified conditioning, even when trained on noisy labels. Furthermore, we validate the robustness of SPFM on the TITW dataset, which consists of in-the-wild speech data, achieving performance that surpasses existing baselines.
Similar Papers
Towards Stable and Structured Time Series Generation with Perturbation-Aware Flow Matching
Machine Learning (CS)
Creates realistic data that changes suddenly.
Federated Flow Matching
Machine Learning (CS)
Trains AI privately on scattered data.
Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training
Sound
Cleans up noisy audio much faster.