MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing
By: Minghao Liu , Zhitao He , Zhiyuan Fan and more
Potential Business Impact:
Makes doctor images change safely with words.
Text-guided image editing has seen significant progress in natural image domains, but its application in medical imaging remains limited and lacks standardized evaluation frameworks. Such editing could revolutionize clinical practices by enabling personalized surgical planning, enhancing medical education, and improving patient communication. To bridge this gap, we introduce MedEBench1, a robust benchmark designed to diagnose reliability in text-guided medical image editing. MedEBench consists of 1,182 clinically curated image-prompt pairs covering 70 distinct editing tasks and 13 anatomical regions. It contributes in three key areas: (1) a clinically grounded evaluation framework that measures Editing Accuracy, Context Preservation, and Visual Quality, complemented by detailed descriptions of intended edits and corresponding Region-of-Interest (ROI) masks; (2) a comprehensive comparison of seven state-of-theart models, revealing consistent patterns of failure; and (3) a diagnostic error analysis technique that leverages attention alignment, using Intersection-over-Union (IoU) between model attention maps and ROI masks to identify mislocalization issues, where models erroneously focus on incorrect anatomical regions. MedEBench sets the stage for developing more reliable and clinically effective text-guided medical image editing tools.
Similar Papers
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing
CV and Pattern Recognition
Tests if computer image edits match words.
IE-Critic-R1: Advancing the Explanatory Measurement of Text-Driven Image Editing for Human Perception Alignment
CV and Pattern Recognition
Helps computers judge edited pictures like people do.
EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
CV and Pattern Recognition
Checks if AI image edits are good.