Integrating Large Citation Datasets
By: Inci Yueksel-Erguen, Ida Litzel, Hanqiu Peng
Potential Business Impact:
Finds which science ideas are most important.
This paper explores methods for building a comprehensive citation graph using big data techniques to evaluate scientific impact more accurately. Traditional citation metrics have limitations, and this work investigates merging large citation datasets to create a more accurate picture. Challenges of big data, like inconsistent data formats and lack of unique identifiers, are addressed through deduplication efforts, resulting in a streamlined and reliable merged dataset with over 119 million records and 1.4 billion citations. We demonstrate that merging large citation datasets builds a more accurate citation graph facilitating a more robust evaluation of scientific impact.
Similar Papers
Academic Literature Recommendation in Large-scale Citation Networks Enhanced by Large Language Models
Applications
Finds the best science papers for researchers.
When a Paper Has 1000 Authors: Rethinking Citation Metrics in the Era of LLMs
Digital Libraries
Find top scientists in huge research papers.
Enhancing the prediction of publications' long-term impact using early citations, readerships, and non-scientific factors
Digital Libraries
Predicts which science papers will be most important.