Score: 1

Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation

Published: May 2, 2025 | arXiv ID: 2505.01091v1

By: Daniele Molino , Francesco di Feola , Linlin Shen and more

Potential Business Impact:

Creates fake X-rays and reports for doctors.

Business Areas:

Image Recognition Data and Analytics, Software

Generative models have revolutionized Artificial Intelligence (AI), particularly in multimodal applications. However, adapting these models to the medical domain poses unique challenges due to the complexity of medical data and the stringent need for clinical accuracy. In this work, we introduce a framework specifically designed for multimodal medical data generation. By enabling the generation of multi-view chest X-rays and their associated clinical report, it bridges the gap between general-purpose vision-language models and the specialized requirements of healthcare. Leveraging the MIMIC-CXR dataset, the proposed framework shows superior performance in generating high-fidelity images and semantically coherent reports. Our quantitative evaluation reveals significant results in terms of FID and BLEU scores, showcasing the quality of the generated data. Notably, our framework achieves comparable or even superior performance compared to real data on downstream disease classification tasks, underlining its potential as a tool for medical research and diagnostics. This study highlights the importance of domain-specific adaptations in enhancing the relevance and utility of generative models for clinical applications, paving the way for future advancements in synthetic multimodal medical data generation.

R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation

CV and Pattern Recognition

Helps computers write better X-ray reports.

5 Aug 2025 1

90%

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

CV and Pattern Recognition

Helps doctors write patient reports in other languages.

2 May 2025 1

90%

CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning

Machine Learning (CS)

Helps doctors diagnose X-rays by thinking step-by-step.

31 Jul 2025 1

View PDF Login to Bookmark

Country of Origin

🇮🇹 🇸🇪 🇨🇳 Italy, Sweden, China

Page Count

8 pages

Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation

Creates fake X-rays and reports for doctors.

Technical Abstract

R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

CX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning