Supervised Fine-Tuning or In-Context Learning? Evaluating LLMs for Clinical NER
By: Andrei Baroian
Potential Business Impact:
Helps doctors find patient problems in notes.
We study clinical Named Entity Recognition (NER) on the CADEC corpus and compare three families of approaches: (i) BERT-style encoders (BERT Base, BioClinicalBERT, RoBERTa-large), (ii) GPT-4o used with few-shot in-context learning (ICL) under simple vs.\ complex prompts, and (iii) GPT-4o with supervised fine-tuning (SFT). All models are evaluated on standard NER metrics over CADEC's five entity types (ADR, Drug, Disease, Symptom, Finding). RoBERTa-large and BioClinicalBERT offer limited improvements over BERT Base, showing the limit of these family of models. Among LLM settings, simple ICL outperforms a longer, instruction-heavy prompt, and SFT achieves the strongest overall performance (F1 $\approx$ 87.1%), albeit with higher cost. We find that the LLM achieve higher accuracy on simplified tasks, restricting classification to two labels.
Similar Papers
LLM, Reporting In! Medical Information Extraction Across Prompting, Fine-tuning and Post-correction
Computation and Language
Helps computers find health words in French text.
A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning
Computation and Language
Helps computers understand patient stories for better care.
Do LLMs Surpass Encoders for Biomedical NER?
Computation and Language
Finds important words in medical texts.