Score: 0

A pipeline for matching bibliographic references with incomplete metadata: experiments with Crossref and OpenCitations

Published: November 23, 2025 | arXiv ID: 2511.18408v1

By: Matteo Guenci , Ivan Heibi , Chiara Parravicini and more

Potential Business Impact:

Links old research papers automatically.

Business Areas:
Semantic Search Internet Services

While Crossref makes available more than 1.8 billion bibliographic references from publications for which it provides a DOI, more than 698 million of these references do not specify a DOI, making the creation of a formal citation link from the citing entity and the cited entity problematic. In this article, we propose an analysis of Crossref bibliographic references to show how we can use the unstructured text defining such references and the available (and partial) metadata specified in them to (a) map them to existing entities included in OpenCitations Meta and, then, (b) to enable the potential inclusion of additional and valid citations link among these entities. We have defined a precise methodology to address the analysis and run it against a manually defined Gold Standard and a subset of Crossref. While the heuristic-based tool developed has demonstrated strong matching precision and effective metadata integration, its recall limitations highlight the necessity of further enhancements to address metadata inconsistencies and leverage additional sources of citation data.

Country of Origin
🇮🇹 Italy

Page Count
25 pages

Category
Computer Science:
Digital Libraries