Score: 2

FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection

Published: November 13, 2025 | arXiv ID: 2511.10046v1

By: Wencong Wu , Xiuwei Zhang , Hanlin Yin and more

Potential Business Impact:

Helps cameras see better in bad weather.

Business Areas:

Image Recognition Data and Analytics, Software

Visible-infrared object detection has gained sufficient attention due to its detection performance in low light, fog, and rain conditions. However, visible and infrared modalities captured by different sensors exist the information imbalance problem in complex scenarios, which can cause inadequate cross-modal fusion, resulting in degraded detection performance. \textcolor{red}{Furthermore, most existing methods use transformers in the spatial domain to capture complementary features, ignoring the advantages of developing frequency domain transformers to mine complementary information.} To solve these weaknesses, we propose a frequency domain fusion transformer, called FreDFT, for visible-infrared object detection. The proposed approach employs a novel multimodal frequency domain attention (MFDA) to mine complementary information between modalities and a frequency domain feed-forward layer (FDFFL) via a mixed-scale frequency feature fusion strategy is designed to better enhance multimodal features. To eliminate the imbalance of multimodal information, a cross-modal global modeling module (CGMM) is constructed to perform pixel-wise inter-modal feature interaction in a spatial and channel manner. Moreover, a local feature enhancement module (LFEM) is developed to strengthen multimodal local feature representation and promote multimodal feature fusion by using various convolution layers and applying a channel shuffle. Extensive experimental results have verified that our proposed FreDFT achieves excellent performance on multiple public datasets compared with other state-of-the-art methods. The code of our FreDFT is linked at https://github.com/WenCongWu/FreDFT.

FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion

CV and Pattern Recognition

Makes blurry night pictures clear and detailed.

12 Jun 2025 1

90%

Towards a Generalizable Fusion Architecture for Multimodal Object Detection

CV and Pattern Recognition

Helps cameras see better in fog and dark.

20 Oct 2025 0

90%

DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection

CV and Pattern Recognition

Finds tiny flaws in pictures from drones.

8 Dec 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Repos / Data Links

github.com

Page Count

12 pages

FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection

Helps cameras see better in bad weather.

Technical Abstract

FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion

Towards a Generalizable Fusion Architecture for Multimodal Object Detection

DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection