A Method for Handling Negative Similarities in Explainable Graph Spectral Clustering of Text Documents -- Extended Version
By: Mieczysław A. Kłopotek , Sławomir T. Wierzchoń , Bartłomiej Starosta and more
Potential Business Impact:
Helps computers group similar ideas, even with tricky words.
This paper investigates the problem of Graph Spectral Clustering with negative similarities, resulting from document embeddings different from the traditional Term Vector Space (like doc2vec, GloVe, etc.). Solutions for combinatorial Laplacians and normalized Laplacians are discussed. An experimental investigation shows the advantages and disadvantages of 6 different solutions proposed in the literature and in this research. The research demonstrates that GloVe embeddings frequently cause failures of normalized Laplacian based GSC due to negative similarities. Furthermore, application of methods curing similarity negativity leads to accuracy improvement for both combinatorial and normalized Laplacian based GSC. It also leads to applicability for GloVe embeddings of explanation methods developed originally bythe authors for Term Vector Space embeddings.
Similar Papers
Explainable Graph Spectral Clustering For Text Embeddings
Machine Learning (CS)
Helps computers understand text better with new methods.
Rough Sets for Explainability of Spectral Graph Clustering
Machine Learning (CS)
Explains why text groups are similar.
An Improved and Generalised Analysis for Spectral Clustering
Machine Learning (CS)
Finds hidden groups in connected information.