Score: 0

Lesion-Aware Visual-Language Fusion for Automated Image Captioning of Ulcerative Colitis Endoscopic Examinations

Published: September 3, 2025 | arXiv ID: 2509.03011v1

By: Alexis Ivan Lopez Escamilla, Gilberto Ochoa, Sharib Al

Potential Business Impact:

Helps doctors describe stomach pictures better.

Business Areas:
Image Recognition Data and Analytics, Software

We present a lesion-aware image captioning framework for ulcerative colitis (UC). The model integrates ResNet embeddings, Grad-CAM heatmaps, and CBAM-enhanced attention with a T5 decoder. Clinical metadata (MES score 0-3, vascular pattern, bleeding, erythema, friability, ulceration) is injected as natural-language prompts to guide caption generation. The system produces structured, interpretable descriptions aligned with clinical practice and provides MES classification and lesion tags. Compared with baselines, our approach improves caption quality and MES classification accuracy, supporting reliable endoscopic reporting.

Country of Origin
🇬🇧 United Kingdom

Page Count
10 pages

Category
Computer Science:
CV and Pattern Recognition