MedCite: Can Language Models Generate Verifiable Text for Medicine?
By: Xiao Wang , Mengjue Tan , Qiao Jin and more
Potential Business Impact:
Helps AI give correct answers with proof.
Existing LLM-based medical question-answering systems lack citation generation and evaluation capabilities, raising concerns about their adoption in practice. In this work, we introduce \name, the first end-to-end framework that facilitates the design and evaluation of citation generation with LLMs for medical tasks. Meanwhile, we introduce a novel multi-pass retrieval-citation method that generates high-quality citations. Our evaluation highlights the challenges and opportunities of citation generation for medical tasks, while identifying important design choices that have a significant impact on the final citation quality. Our proposed method achieves superior citation precision and recall improvements compared to strong baseline methods, and we show that evaluation results correlate well with annotation results from professional experts.
Similar Papers
Evaluating Large Language Models for Evidence-Based Clinical Question Answering
Computation and Language
Helps doctors answer patient questions better.
Streamlining Biomedical Research with Specialized LLMs
Computation and Language
Helps scientists find answers faster.
Attribution, Citation, and Quotation: A Survey of Evidence-based Text Generation with Large Language Models
Computation and Language
Makes AI stories show where their facts came from.