Score: 1

PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters

Published: January 6, 2026 | arXiv ID: 2601.03237v1

By: Javier Salazar Cavazos

Potential Business Impact:

Finds hidden groups in messy data better.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Foundation vision, audio, and language models enable zero-shot performance on downstream tasks via their latent representations. Recently, unsupervised learning of data group structure with deep learning methods has gained popularity. TURTLE, a state of the art deep clustering algorithm, uncovers data labeling without supervision by alternating label and hyperplane updates, maximizing the hyperplane margin, in a similar fashion to support vector machines (SVMs). However, TURTLE assumes clusters are balanced; when data is imbalanced, it yields non-ideal hyperplanes that cause higher clustering error. We propose PET-TURTLE, which generalizes the cost function to handle imbalanced data distributions by a power law prior. Additionally, by introducing sparse logits in the labeling process, PET-TURTLE optimizes a simpler search space that in turn improves accuracy for balanced datasets. Experiments on synthetic and real data show that PET-TURTLE improves accuracy for imbalanced sources, prevents over-prediction of minority clusters, and enhances overall clustering.

Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data

Machine Learning (CS)

Makes computer learning fair for rare things.

19 Sep 2025 1

84%

Prediction of high-frequency futures return directions based on the mean uncertainty classification methods: An application in China's future market

Trading & Market Microstructure

Predicts stock price moves to make more money.

9 Aug 2025 0

84%

Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care

Machine Learning (CS)

Helps doctors predict patient danger faster.

25 Dec 2025 1

View PDF Login to Bookmark

Page Count

5 pages

PET-TURTLE: Deep Unsupervised Support Vector Machines for Imbalanced Data Clusters

Finds hidden groups in messy data better.

Technical Abstract

Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data

Prediction of high-frequency futures return directions based on the mean uncertainty classification methods: An application in China's future market

Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care