Score: 1

A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems

Published: March 6, 2025 | arXiv ID: 2503.04447v1

By: Wei Liu , Xin Liu , Michael K. Ng and more

Potential Business Impact:

Finds groups in data without knowing how many.

Business Areas:
Semantic Search Internet Services

Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.

Country of Origin
🇭🇰 🇨🇳 China, Hong Kong

Page Count
22 pages

Category
Mathematics:
Optimization and Control