Score: 1

Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches

Published: November 14, 2025 | arXiv ID: 2511.11867v1

By: Namu Park , Giridhar Kaushik Ramachandran , Kevin Lybarger and more

Potential Business Impact:

Helps doctors know if patients need more scans.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Large language models (LLMs) have shown considerable promise in clinical natural language processing, yet few domain-specific datasets exist to rigorously evaluate their performance on radiology tasks. In this work, we introduce an annotated corpus of 6,393 radiology reports from 586 patients, each labeled for follow-up imaging status, to support the development and benchmarking of follow-up adherence detection systems. Using this corpus, we systematically compared traditional machine-learning classifiers, including logistic regression (LR), support vector machines (SVM), Longformer, and a fully fine-tuned Llama3-8B-Instruct, with recent generative LLMs. To evaluate generative LLMs, we tested GPT-4o and the open-source GPT-OSS-20B under two configurations: a baseline (Base) and a task-optimized (Advanced) setting that focused inputs on metadata, recommendation sentences, and their surrounding context. A refined prompt for GPT-OSS-20B further improved reasoning accuracy. Performance was assessed using precision, recall, and F1 scores with 95% confidence intervals estimated via non-parametric bootstrapping. Inter-annotator agreement was high (F1 = 0.846). GPT-4o (Advanced) achieved the best performance (F1 = 0.832), followed closely by GPT-OSS-20B (Advanced; F1 = 0.828). LR and SVM also performed strongly (F1 = 0.776 and 0.775), underscoring that while LLMs approach human-level agreement through prompt optimization, interpretable and resource-efficient models remain valuable baselines.

Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches

Computation and Language

Helps doctors find hidden sicknesses in scans.

5 Dec 2025 0

90%

Structuring Radiology Reports: Challenging LLMs with Lightweight Models

Computation and Language

Makes doctor reports easier for computers to read.

30 May 2025 2

90%

Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction

CV and Pattern Recognition

Helps doctors write X-ray reports faster.

10 Apr 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

10 pages

Identifying Imaging Follow-Up in Radiology Reports: A Comparative Analysis of Traditional ML and LLM Approaches

Helps doctors know if patients need more scans.

Technical Abstract

Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches

Structuring Radiology Reports: Challenging LLMs with Lightweight Models

Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction