PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters
By: Javier Salazar Cavazos
Potential Business Impact:
Finds hidden groups in messy data better.
Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with deep learning methods has gained popularity. TURTLE, a state of the art deep clustering algorithm, uncovers data labeling without supervision by alternating label and hyperplane updates, maximizing the hyperplane margin, in a similar fashion to support vector machines (SVMs). However, TURTLE assumes clusters are balanced; when data is imbalanced, it yields non-ideal hyperplanes that cause higher clustering error. We propose PET-TURTLE, which generalizes the cost function to handle imbalanced data distributions by a power law prior. Additionally, by introducing sparse logits in the labeling process, PET-TURTLE optimizes a simpler search space that in turn improves accuracy for balanced datasets. Experiments on synthetic and real data show that PET-TURTLE improves accuracy for imbalanced sources, prevents over-prediction of minority clusters, and enhances overall clustering.
Similar Papers
Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data
Machine Learning (CS)
Makes computer learning fair for rare things.
Prediction of high-frequency futures return directions based on the mean uncertainty classification methods: An application in China's future market
Trading & Market Microstructure
Predicts stock price moves to make more money.
Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care
Machine Learning (CS)
Helps doctors predict patient danger faster.