Score: 0

Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism

Published: April 23, 2025 | arXiv ID: 2504.16774v1

By: Lakshita Agarwal, Bindu Verma

Potential Business Impact:

Reads X-rays and explains what's wrong.

Business Areas:

Image Recognition Data and Analytics, Software

The examination of chest X-ray images is a crucial component in detecting various thoracic illnesses. This study introduces a new image description generation model that integrates a Vision Transformer (ViT) encoder with cross-modal attention and a GPT-4-based transformer decoder. The ViT captures high-quality visual features from chest X-rays, which are fused with text data through cross-modal attention to improve the accuracy, context, and richness of image descriptions. The GPT-4 decoder transforms these fused features into accurate and relevant captions. The model was tested on the National Institutes of Health (NIH) and Indiana University (IU) Chest X-ray datasets. On the IU dataset, it achieved scores of 0.854 (B-1), 0.883 (CIDEr), 0.759 (METEOR), and 0.712 (ROUGE-L). On the NIH dataset, it achieved the best performance on all metrics: BLEU 1--4 (0.825, 0.788, 0.765, 0.752), CIDEr (0.857), METEOR (0.726), and ROUGE-L (0.705). This framework has the potential to enhance chest X-ray evaluation, assisting radiologists in more precise and efficient diagnosis.

Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

CV and Pattern Recognition

Helps doctors write X-ray reports faster.

21 Jan 2025 0

91%

ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays

CV and Pattern Recognition

Helps doctors find sickness on X-rays faster.

4 Jul 2025 0

90%

Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays

Image and Video Processing

Helps doctors find pneumonia faster with X-rays.

11 Jul 2025 0

View PDF Login to Bookmark

Page Count

12 pages

Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism

Reads X-rays and explains what's wrong.

Technical Abstract

Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays

Comparative Analysis of Vision Transformers and Traditional Deep Learning Approaches for Automated Pneumonia Detection in Chest X-Rays