SDEC: Semantic Deep Embedded Clustering
By: Mohammad Wali Ur Rahman , Ric Nevarez , Lamia Tasnim Mim and more
Potential Business Impact:
Groups similar texts better, finding hidden meanings.
The high dimensional and semantically complex nature of textual Big data presents significant challenges for text clustering, which frequently lead to suboptimal groupings when using conventional techniques like k-means or hierarchical clustering. This work presents Semantic Deep Embedded Clustering (SDEC), an unsupervised text clustering framework that combines an improved autoencoder with transformer-based embeddings to overcome these challenges. This novel method preserves semantic relationships during data reconstruction by combining Mean Squared Error (MSE) and Cosine Similarity Loss (CSL) within an autoencoder. Furthermore, a semantic refinement stage that takes advantage of the contextual richness of transformer embeddings is used by SDEC to further improve a clustering layer with soft cluster assignments and distributional loss. The capabilities of SDEC are demonstrated by extensive testing on five benchmark datasets: AG News, Yahoo! Answers, DBPedia, Reuters 2, and Reuters 5. The framework not only outperformed existing methods with a clustering accuracy of 85.7% on AG News and set a new benchmark of 53.63% on Yahoo! Answers, but also showed robust performance across other diverse text corpora. These findings highlight the significant improvements in accuracy and semantic comprehension of text data provided by SDEC's advances in unsupervised text clustering.
Similar Papers
A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering
Information Retrieval
Helps doctors share patient information easily.
Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles
Machine Learning (CS)
Helps scientists see patterns in complex data.
Semantic Encryption: Secure and Effective Interaction with Cloud-based Large Language Models via Semantic Transformation
Cryptography and Security
Keeps chat secrets safe from AI without confusion