Institute Disambiguation using Author-Institution Co-Occurrence
By: Achal Agrawal, Jeet Mukherjee
Potential Business Impact:
Groups similar university names automatically.
In this article we propose a novel method to perform unsupervised clustering of different forms of Institute names. We use only author and affiliation metadata to perform the clustering without any string or pattern matching. After analysing only 50000 articles from Crossref database, we see encouraging results which can be scaled up to provide even better results. We compare our clustering with what a well-known method using string matching does and found that the results were complementary. This can help perform institute disambiguation better when integrated with existing systems, especially to provide aliases for cases where traditional string matching fails. The code of this open-source methodology can be found at: https://github.com/Jeet009/Institute-Disambiguation-using-Author-Institution-Co-Occurrence
Similar Papers
Practical Author Name Disambiguation under Metadata Constraints: A Contrastive Learning Approach for Astronomy Literature
Instrumentation and Methods for Astrophysics
Links scientists' papers to their names accurately.
Institutional cooperations in Austrian research: An analysis of shared researchers
Digital Libraries
Helps scientists share ideas for better research.
Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences
Digital Libraries
Shows how companies and schools work together on AI.