Score: 0

Citation importance-aware document representation learning for large-scale science mapping

Published: December 15, 2025 | arXiv ID: 2512.13054v1

By: Zhentao Liang , Nees Jan van Eck , Xuehua Wu and more

Effective science mapping relies on high-quality representations of scientific documents. As an important task in scientometrics and information studies, science mapping is often challenged by the complex and heterogeneous nature of citations. While previous studies have attempted to improve document representations by integrating citation and semantic information, the heterogeneity of citations is often overlooked. To address this problem, this study proposes a citation importance-aware contrastive learning framework that refines the supervisory signal. We first develop a scalable measurement of citation importance based on location, frequency, and self-citation characteristics. Citation importance is then integrated into the contrastive learning process through an importance-aware sampling strategy, which selects low-importance citations as hard negatives. This forces the model to learn finer-grained representations that distinguish between important and perfunctory citations. To validate the effectiveness of the proposed framework, we fine-tune a SciBERT model and perform extensive evaluations on SciDocs and PubMed benchmark datasets. Results show consistent improvements in both document representation quality and science mapping accuracy. Furthermore, we apply the trained model to over 33 million documents from Web of Science. The resulting map of science accurately visualizes the global and local intellectual structure of science and reveals interdisciplinary research fronts. By operationalizing citation heterogeneity into a scalable computational framework, this study demonstrates how differentiating citations by their importance can be effectively leveraged to improve document representation and science mapping.

Category
Computer Science:
Digital Libraries