Score: 0

Institute Disambiguation using Author-Institution Co-Occurrence

Published: October 18, 2025 | arXiv ID: 2510.16407v1

By: Achal Agrawal, Jeet Mukherjee

Potential Business Impact:

Groups similar university names automatically.

Business Areas:
Text Analytics Data and Analytics, Software

In this article we propose a novel method to perform unsupervised clustering of different forms of Institute names. We use only author and affiliation metadata to perform the clustering without any string or pattern matching. After analysing only 50000 articles from Crossref database, we see encouraging results which can be scaled up to provide even better results. We compare our clustering with what a well-known method using string matching does and found that the results were complementary. This can help perform institute disambiguation better when integrated with existing systems, especially to provide aliases for cases where traditional string matching fails. The code of this open-source methodology can be found at: https://github.com/Jeet009/Institute-Disambiguation-using-Author-Institution-Co-Occurrence

Page Count
6 pages

Category
Computer Science:
Digital Libraries