A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems
By: Wei Liu , Xin Liu , Michael K. Ng and more
Potential Business Impact:
Finds groups in data without knowing how many.
Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.
Similar Papers
Graph-based Semi-supervised and Unsupervised Methods for Local Clustering
Machine Learning (CS)
Finds hidden groups in big data networks.
An Efficient Quadratic Penalty Method for a Class of Graph Clustering Problems
Optimization and Control
Finds hidden groups in online friend networks.
A Scalable Global Optimization Algorithm For Constrained Clustering
Machine Learning (CS)
Groups data better, even with many rules.