AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
By: Sixian Liu , Chen Xu , Qiang Wang and more
Potential Business Impact:
Helps self-driving cars see better in bad weather.
Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on challenging excavator operation scenarios to benchmark performance in complex conditions. Our method not only achieves competitive performance on the standard KITTI dataset with 93.92% accuracy, but also significantly outperforms the baseline by 24.88% on the challenging E3D dataset, demonstrating superior robustness to unreliable modal information in complex industrial scenes.
Similar Papers
DGFusion: Dual-guided Fusion for Robust Multi-Modal 3D Object Detection
CV and Pattern Recognition
Helps self-driving cars see far-away objects better.
GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection
CV and Pattern Recognition
Helps robots see and understand 3D objects better.
DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception
CV and Pattern Recognition
Helps self-driving cars see better in bad weather.