A pipeline for matching bibliographic references with incomplete metadata: experiments with Crossref and OpenCitations
By: Matteo Guenci , Ivan Heibi , Chiara Parravicini and more
Potential Business Impact:
Links old research papers automatically.
While Crossref makes available more than 1.8 billion bibliographic references from publications for which it provides a DOI, more than 698 million of these references do not specify a DOI, making the creation of a formal citation link from the citing entity and the cited entity problematic. In this article, we propose an analysis of Crossref bibliographic references to show how we can use the unstructured text defining such references and the available (and partial) metadata specified in them to (a) map them to existing entities included in OpenCitations Meta and, then, (b) to enable the potential inclusion of additional and valid citations link among these entities. We have defined a precise methodology to address the analysis and run it against a manually defined Gold Standard and a subset of Crossref. While the heuristic-based tool developed has demonstrated strong matching precision and effective metadata integration, its recall limitations highlight the necessity of further enhancements to address metadata inconsistencies and leverage additional sources of citation data.
Similar Papers
MetaInfoSci: An Integrated Web Tool for Scholarly Data Analysis
Digital Libraries
Helps scientists find and understand research faster.
Investigating Document Type, Language, Publication Year, and Author Count Discrepancies Between OpenAlex and Web of Science
Digital Libraries
Improves science data for better research tracking.
Guarding against artificial intelligence--hallucinated citations: the case for full-text reference deposit
Digital Libraries
Stops AI from making up fake sources.