Linking Data Citation to Repository Visibility: An Empirical Study
By: Fakhri Momeni , Janete Saldanha Bach , Brigitte Mathiak and more
Potential Business Impact:
Makes research data easier to find and cite.
In today's data-driven research landscape, dataset visibility and accessibility play a crucial role in advancing scientific knowledge. At the same time, data citation is essential for maintaining academic integrity, acknowledging contributions, validating research outcomes, and fostering scientific reproducibility. As a critical link, it connects scholarly publications with the datasets that drive scientific progress. This study investigates whether repository visibility influences data citation rates. We hypothesize that repositories with higher visibility, as measured by search engine metrics, are associated with increased dataset citations. Using OpenAlex data and repository impact indicators (including the visibility index from Sistrix, the h-index of repositories, and citation metrics such as mean and median citations), we analyze datasets in Social Sciences and Economics to explore their relationship. Our findings suggest that datasets hosted on more visible web domains tend to receive more citations, with a positive correlation observed between web domain visibility and dataset citation counts, particularly for datasets with at least one citation. However, when analyzing domain-level citation metrics, such as the h-index, mean, and median citations, the correlations are inconsistent and weaker. While higher visibility domains tend to host datasets with greater citation impact, the distribution of citations across datasets varies significantly. These results suggest that while visibility plays a role in increasing citation counts, it is not the sole factor influencing dataset citation impact. Other elements, such as dataset quality, research trends, and disciplinary norms, can also contribute to citation patterns.
Similar Papers
Beyond Citations: A Cross-Domain Metric for Dataset Impact and Shareability
Computers and Society
Measures how much research data is actually used.
How are research data referenced? The use case of the research data repository RADAR
Digital Libraries
Helps scientists track how their data is used.
"We provide our resources in a dedicated repository": Surveying the Transparency of HICSS publications
Software Engineering
Makes science work easier for others to check.