Score: 1

MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance

Published: March 19, 2025 | arXiv ID: 2503.14944v1

By: Zihan Cao , Yu Zhong , Ziqi Wang and more

Potential Business Impact:

Combines blurry pictures into one clear image.

Business Areas:

Image Recognition Data and Analytics, Software

Image fusion, a fundamental low-level vision task, aims to integrate multiple image sequences into a single output while preserving as much information as possible from the input. However, existing methods face several significant limitations: 1) requiring task- or dataset-specific models; 2) neglecting real-world image degradations (\textit{e.g.}, noise), which causes failure when processing degraded inputs; 3) operating in pixel space, where attention mechanisms are computationally expensive; and 4) lacking user interaction capabilities. To address these challenges, we propose a unified framework for multi-task, multi-degradation, and language-guided image fusion. Our framework includes two key components: 1) a practical degradation pipeline that simulates real-world image degradations and generates interactive prompts to guide the model; 2) an all-in-one Diffusion Transformer (DiT) operating in latent space, which fuses a clean image conditioned on both the degraded inputs and the generated prompts. Furthermore, we introduce principled modifications to the original DiT architecture to better suit the fusion task. Based on this framework, we develop two versions of the model: Regression-based and Flow Matching-based variants. Extensive qualitative and quantitative experiments demonstrate that our approach effectively addresses the aforementioned limitations and outperforms previous restoration+fusion and all-in-one pipelines. Codes are available at https://github.com/294coder/MMAIF.

MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics

CV and Pattern Recognition

Cleans up blurry pictures from bad weather.

16 Nov 2025 2

88%

Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach

CV and Pattern Recognition

Combines pictures using words to make better images.

8 Dec 2025 1

88%

Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion

CV and Pattern Recognition

Combines different pictures for clearer images.

21 Aug 2025 1

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

11 pages

MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance

Combines blurry pictures into one clear image.

Technical Abstract

MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics

Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach

Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion