Publication Trend Analysis and Synthesis via Large Language Model: A Case Study of Engineering in PNAS
By: Mason Smetana, Lev Khazanovich
Potential Business Impact:
Maps how science ideas connect and change.
Scientific literature is increasingly siloed by complex language, static disciplinary structures, and potentially sparse keyword systems, making it cumbersome to capture the dynamic nature of modern science. This study addresses these challenges by introducing an adaptable large language model (LLM)-driven framework to quantify thematic trends and map the evolving landscape of scientific knowledge. The approach is demonstrated over a 20-year collection of more than 1,500 engineering articles published by the Proceedings of the National Academy of Sciences (PNAS), marked for their breadth and depth of research focus. A two-stage classification pipeline first establishes a primary thematic category for each article based on its abstract. The subsequent phase performs a full-text analysis to assign secondary classifications, revealing latent, cross-topic connections across the corpus. Traditional natural language processing (NLP) methods, such as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF), confirm the resulting topical structure and also suggest that standalone word-frequency analyses may be insufficient for mapping fields with high diversity. Finally, a disjoint graph representation between the primary and secondary classifications reveals implicit connections between themes that may be less apparent when analyzing abstracts or keywords alone. The findings show that the approach independently recovers much of the journal's editorially embedded structure without prior knowledge of its existing dual-classification schema (e.g., biological studies also classified as engineering). This framework offers a powerful tool for detecting potential thematic trends and providing a high-level overview of scientific progress.
Similar Papers
Preface to the Special Issue of the TAL Journal on Scholarly Document Processing
Digital Libraries
Helps scientists find important research faster.
Optimizing Data Extraction from Materials Science Literature: A Study of Tools Using Large Language Models
Digital Libraries
AI finds science facts in papers faster.
LLM-Based Information Extraction to Support Scientific Literature Research and Publication Workflows
Digital Libraries
Helps find important ideas in science papers.