Score: 1

BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

Published: November 16, 2025 | arXiv ID: 2511.12821v1

By: Ruiyu Wang , Yuzhang Xie , Xiao Hu and more

Potential Business Impact:

Shows how teamwork and AI boost science papers.

Business Areas:

Bioinformatics Biotechnology, Data and Analytics, Science and Engineering

Assessing journal impact is central to scholarly communication, yet existing open resources rarely capture how collaboration structures and artificial intelligence (AI) research jointly shape venue prestige in biomedicine. We present BioMedJImpact, a large-scale, biomedical-oriented dataset designed to advance journal-level analysis of scientific impact and AI engagement. Built from 1.74 million PubMed Central articles across 2,744 journals, BioMedJImpact integrates bibliometric indicators, collaboration features, and LLM-derived semantic indicators for AI engagement. Specifically, the AI engagement feature is extracted through a reproducible three-stage LLM pipeline that we propose. Using this dataset, we analyze how collaboration intensity and AI engagement jointly influence scientific impact across pre- and post-pandemic periods (2016-2019, 2020-2023). Two consistent trends emerge: journals with higher collaboration intensity, particularly those with larger and more diverse author teams, tend to achieve greater citation impact, and AI engagement has become an increasingly strong correlate of journal prestige, especially in quartile rankings. To further validate the three-stage LLM pipeline we proposed for deriving the AI engagement feature, we conduct human evaluation, confirming substantial agreement in AI relevance detection and consistent subfield classification. Together, these contributions demonstrate that BioMedJImpact serves as both a comprehensive dataset capturing the intersection of biomedicine and AI, and a validated methodological framework enabling scalable, content-aware scientometric analysis of scientific impact and innovation dynamics. Code is available at https://github.com/JonathanWry/BioMedJImpact.

Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences

Digital Libraries

Shows how companies and schools work together on AI.

2 Oct 2025 1

87%

Research quality evaluation by AI in the era of Large Language Models: Advantages, disadvantages, and systemic effects

Digital Libraries

AI helps judge research quality, but can be biased.

9 Jun 2025 0

87%

Deciphering Scientific Collaboration in Biomedical LLM Research: Dynamics, Institutional Participation, and Resource Disparities

Social and Information Networks

Helps more doctors use AI for new medicines.

2 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

16 pages

BioMedJImpact: A Comprehensive Dataset and LLM Pipeline for AI Engagement and Scientific Impact Analysis of Biomedical Journals

Shows how teamwork and AI boost science papers.

Technical Abstract

Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences

Research quality evaluation by AI in the era of Large Language Models: Advantages, disadvantages, and systemic effects

Deciphering Scientific Collaboration in Biomedical LLM Research: Dynamics, Institutional Participation, and Resource Disparities