CM-Diff: A Single Generative Network for Bidirectional Cross-Modality Translation Diffusion Model Between Infrared and Visible Images
By: Bin Hu , Chenqiang Gao , Shurui Liu and more
Potential Business Impact:
Makes infrared pictures look like real photos.
Image translation is one of the crucial approaches for mitigating information deficiencies in the infrared and visible modalities, while also facilitating the enhancement of modality-specific datasets. However, existing methods for infrared and visible image translation either achieve unidirectional modality translation or rely on cycle consistency for bidirectional modality translation, which may result in suboptimal performance. In this work, we present the bidirectional cross-modality translation diffusion model (CM-Diff) for simultaneously modeling data distributions in both the infrared and visible modalities. We address this challenge by combining translation direction labels for guidance during training with cross-modality feature control. Specifically, we view the establishment of the mapping relationship between the two modalities as the process of learning data distributions and understanding modality differences, achieved through a novel Bidirectional Diffusion Training (BDT). Additionally, we propose a Statistical Constraint Inference (SCI) to ensure the generated image closely adheres to the data distribution of the target modality. Experimental results demonstrate the superiority of our CM-Diff over state-of-the-art methods, highlighting its potential for generating dual-modality datasets.
Similar Papers
CycleDiff: Cycle Diffusion Models for Unpaired Image-to-image Translation
CV and Pattern Recognition
Changes pictures from one style to another.
Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation
CV and Pattern Recognition
Helps doctors understand patient health from pictures and words.
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
CV and Pattern Recognition
Translates between different types of data, like pictures and 3D shapes.