Score: 0

Pyramidal Adaptive Cross-Gating for Multimodal Detection

Published: December 20, 2025 | arXiv ID: 2512.18291v1

By: Zidong Gu, Shoufu Tian

Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different modalities, they commonly rely on simple fusion strategies for feature aggregation. This introduces two critical flaws: it is prone to cross-modal noise and disrupts the hierarchical structure of the feature pyramid, thereby impairing the fine-grained detection of small objects. To address this challenge, we propose the Pyramidal Adaptive Cross-Gating Network (PACGNet), an architecture designed to perform deep fusion within the backbone. To this end, we design two core components: the Symmetrical Cross-Gating (SCG) module and the Pyramidal Feature-aware Multimodal Gating (PFMG) module. The SCG module employs a bidirectional, symmetrical "horizontal" gating mechanism to selectively absorb complementary information, suppress noise, and preserve the semantic integrity of each modality. The PFMG module reconstructs the feature hierarchy via a progressive hierarchical gating mechanism. This leverages the detailed features from a preceding, higher-resolution level to guide the fusion at the current, lower-resolution level, effectively preserving fine-grained details as features propagate. Through evaluations conducted on the DroneVehicle and VEDAI datasets, our PACGNet sets a new state-of-the-art benchmark, with mAP50 scores reaching 81.7% and 82.1% respectively.

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

CV and Pattern Recognition

Helps robots see and understand 3D objects better.

2 Dec 2025 1

88%

AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes

CV and Pattern Recognition

Helps self-driving cars see better in bad weather.

27 Oct 2025 0

88%

PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis

Machine Learning (CS)

Helps computers understand feelings from words, sound, and pictures.

20 Aug 2025 1

View PDF Login to Bookmark

Pyramidal Adaptive Cross-Gating for Multimodal Detection

Technical Abstract

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes

PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis