DeCode: Decoupling Content and Delivery for Medical QA
By: Po-Jen Ko, Chen-Han Tsai, Yu-Shao Peng
Potential Business Impact:
Helps doctors give patients better, personalized health advice.
Large language models (LLMs) exhibit strong medical knowledge and can generate factually accurate responses. However, existing models often fail to account for individual patient contexts, producing answers that are clinically correct yet poorly aligned with patients' needs. In this work, we introduce DeCode, a training-free, model-agnostic framework that adapts existing LLMs to produce contextualized answers in clinical settings. We evaluate DeCode on OpenAI HealthBench, a comprehensive and challenging benchmark designed to assess clinical relevance and validity of LLM responses. DeCode improves the previous state of the art from $28.4\%$ to $49.8\%$, corresponding to a $75\%$ relative improvement. Experimental results suggest the effectiveness of DeCode in improving clinical question answering of LLMs.
Similar Papers
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Information Retrieval
Tests if AI doctors give good advice.
Structured Outputs Enable General-Purpose LLMs to be Medical Experts
Computation and Language
Helps AI give safer, smarter answers about health.
Beyond MedQA: Towards Real-world Clinical Decision Making in the Era of LLMs
Computation and Language
Helps doctors make better choices using smart computer programs.