Eyes on the Image: Gaze Supervised Multimodal Learning for Chest X-ray Diagnosis and Report Generation
By: Tanjim Islam Riju , Shuchismita Anwar , Saman Sarker Joy and more
Potential Business Impact:
Helps doctors find sickness in X-rays better.
We propose a two-stage multimodal framework that enhances disease classification and region-aware radiology report generation from chest X-rays, leveraging the MIMIC-Eye dataset. In the first stage, we introduce a gaze-guided contrastive learning architecture for disease classification. It integrates visual features, clinical labels, bounding boxes, and radiologist eye-tracking signals and is equipped with a novel multi-term gaze-attention loss combining MSE, KL divergence, correlation, and center-of-mass alignment. Incorporating fixations improves F1 score from 0.597 to 0.631 (+5.70%) and AUC from 0.821 to 0.849 (+3.41%), while also improving precision and recall, highlighting the effectiveness of gaze-informed attention supervision. In the second stage, we present a modular report generation pipeline that extracts confidence-weighted diagnostic keywords, maps them to anatomical regions using a curated dictionary constructed from domain-specific priors, and generates region-aligned sentences via structured prompts. This pipeline improves report quality as measured by clinical keyword recall and ROUGE overlap. Our results demonstrate that integrating gaze data improves both classification performance and the interpretability of generated medical reports.
Similar Papers
RadEyeVideo: Enhancing general-domain Large Vision Language Model for chest X-ray analysis with video representations of eye gaze
CV and Pattern Recognition
Helps AI understand X-rays by watching doctors' eyes.
Radiology Report Generation with Layer-Wise Anatomical Attention
CV and Pattern Recognition
Helps doctors write X-ray reports faster.
GazeLT: Visual attention-guided long-tailed disease classification in chest radiographs
CV and Pattern Recognition
Helps doctors find rare diseases in X-rays.