Score: 0

Integrating Large Citation Datasets

Published: May 7, 2025 | arXiv ID: 2505.04309v1

By: Inci Yueksel-Erguen, Ida Litzel, Hanqiu Peng

Potential Business Impact:

Finds which science ideas are most important.

Business Areas:
Big Data Data and Analytics

This paper explores methods for building a comprehensive citation graph using big data techniques to evaluate scientific impact more accurately. Traditional citation metrics have limitations, and this work investigates merging large citation datasets to create a more accurate picture. Challenges of big data, like inconsistent data formats and lack of unique identifiers, are addressed through deduplication efforts, resulting in a streamlined and reliable merged dataset with over 119 million records and 1.4 billion citations. We demonstrate that merging large citation datasets builds a more accurate citation graph facilitating a more robust evaluation of scientific impact.

Country of Origin
πŸ‡ΈπŸ‡¬ Singapore

Page Count
8 pages

Category
Computer Science:
Digital Libraries