Small Language Models Can Use Nuanced Reasoning For Health Science Research Classification: A Microbial-Oncogenesis Case Study
By: Muhammed Muaaz Dawood , Mohammad Zaid Moonsamy , Kaela Kokkas and more
Potential Business Impact:
Helps AI quickly sort medical papers.
Artificially intelligent (AI) co-scientists must be able to sift through research literature cost-efficiently while applying nuanced scientific reasoning. We evaluate Small Language Models (SLMs, <= 8B parameters) for classifying medical research papers. Using literature on the oncogenic potential of HMTV/MMTV-like viruses in breast cancer as a case study, we assess model performance with both zero-shot and in-context learning (ICL; few-shot prompting) strategies against frontier proprietary Large Language Models (LLMs). Llama 3 and Qwen2.5 outperform GPT-5 (API, low/high effort), Gemini 3 Pro Preview, and Meerkat in zero-shot settings, though trailing Gemini 2.5 Pro. ICL leads to improved performance on a case-by-case basis, allowing Llama 3 and Qwen2.5 to match Gemini 2.5 Pro in binary classification. Systematic lexical-ablation experiments show that SLM decisions are often grounded in valid scientific cues but can be influenced by spurious textual artifacts, underscoring need for interpretability in high-stakes pipelines. Our results reveal both promise and limitations of modern SLMs for scientific triage; pairing SLMs with simple but principled prompting strategies can approach performance of the strongest LLMs for targeted literature filtering in co-scientist pipelines.
Similar Papers
The Rise of Small Language Models in Healthcare: A Comprehensive Survey
Computation and Language
Makes smaller computer brains work for doctors.
HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
Artificial Intelligence
Lets health trackers predict problems privately.
Applications of Small Language Models in Medical Imaging Classification with a Focus on Prompt Strategies
CV and Pattern Recognition
Helps doctors read X-rays faster with small AI.