LDP: Parameter-Efficient Fine-Tuning of Multimodal LLM for Medical Report Generation
By: Tianyu Zhou , Junyi Tang , Zehui Li and more
Potential Business Impact:
Helps doctors write better cancer reports from pictures.
Colonoscopic polyp diagnosis is pivotal for early colorectal cancer detection, yet traditional automated reporting suffers from inconsistencies and hallucinations due to the scarcity of high-quality multimodal medical data. To bridge this gap, we propose LDP, a novel framework leveraging multimodal large language models (MLLMs) for professional polyp diagnosis report generation. Specifically, we curate MMEndo, a multimodal endoscopic dataset comprising expert-annotated colonoscopy image-text pairs. We fine-tune the Qwen2-VL-7B backbone using Parameter-Efficient Fine-Tuning (LoRA) and align it with clinical standards via Direct Preference Optimization (DPO). Extensive experiments show that our LDP outperforms existing baselines on both automated metrics and rigorous clinical expert evaluations (achieving a Physician Score of 7.2/10), significantly reducing training computational costs by 833x compared to full fine-tuning. The proposed solution offers a scalable, clinically viable path for primary healthcare, with additional validation on the IU-XRay dataset confirming its robustness.
Similar Papers
Data-Efficient Fine-Tuning of Vision-Language Models for Diagnosis of Alzheimer's Disease
CV and Pattern Recognition
Helps doctors find Alzheimer's using brain scans.
Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models
CV and Pattern Recognition
Helps doctors find diseases in CT scans faster.
Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning
Image and Video Processing
Helps doctors find throat cancer better.