Evaluating local large language models for structured extraction from endometriosis-specific transvaginal ultrasound reports
By: Haiyi Li , Yutong Li , Yiheng Chi and more
In this study, we evaluate a locally-deployed large-language model (LLM) to convert unstructured endometriosis transvaginal ultrasound (eTVUS) scan reports into structured data for imaging informatics workflows. Across 49 eTVUS reports, we compared three LLMs (7B/8B and a 20B-parameter model) against expert human extraction. The 20B model achieved a mean accuracy of 86.02%, substantially outperforming smaller models and confirming the importance of scale in handling complex clinical text. Crucially, we identified a highly complementary error profile: the LLM excelled at syntactic consistency (e.g., date/numeric formatting) where humans faltered, while human experts provided superior semantic and contextual interpretation. We also found that the LLM's semantic errors were fundamental limitations that could not be mitigated by simple prompt engineering. These findings strongly support a human-in-the-loop (HITL) workflow in which the on-premise LLM serves as a collaborative tool, not a full replacement. It automates routine structuring and flags potential human errors, enabling imaging specialists to focus on high-level semantic validation. We discuss implications for structured reporting and interactive AI systems in clinical practice.
Similar Papers
Leveraging large language models for structured information extraction from pathology reports
Computation and Language
Helps doctors quickly get cancer report facts.
Large Language Models with Human-In-The-Loop Validation for Systematic Review Data Extraction
Human-Computer Interaction
AI helps doctors find important info faster.
ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports
Computation and Language
Lets computers understand old doctor notes.