Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs
By: Chenqian Le , Ziheng Gong , Chihang Wang and more
Potential Business Impact:
Helps computers answer medical questions better.
Large language models (LLMs) have shown great potential in medical question answering (MedQA), yet adapting them to biomedical reasoning remains challenging due to domain-specific complexity and limited supervision. In this work, we study how prompt design and lightweight fine-tuning affect the performance of open-source LLMs on PubMedQA, a benchmark for multiple-choice biomedical questions. We focus on two widely used prompting strategies - standard instruction prompts and Chain-of-Thought (CoT) prompts - and apply QLoRA for parameter-efficient instruction tuning. Across multiple model families and sizes, our experiments show that CoT prompting alone can improve reasoning in zero-shot settings, while instruction tuning significantly boosts accuracy. However, fine-tuning on CoT prompts does not universally enhance performance and may even degrade it for certain larger models. These findings suggest that reasoning-aware prompts are useful, but their benefits are model- and scale-dependent. Our study offers practical insights into combining prompt engineering with efficient finetuning for medical QA applications.
Similar Papers
Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers
Artificial Intelligence
Makes AI better at solving hard science problems.
Teaching LLMs How to Learn with Contextual Fine-Tuning
Machine Learning (CS)
Teaches computers to learn new things faster.
Short-Path Prompting in LLMs: Analyzing Reasoning Instability and Solutions for Robust Performance
Computation and Language
Makes AI think better even with short questions.