Score: 0

Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection

Published: December 14, 2025 | arXiv ID: 2512.12884v1

By: Xiangzhong Liu, Jiajie Zhang, Hao Shen

Potential Business Impact:

Helps cars see better by combining different sensors.

Business Areas:

Image Recognition Data and Analytics, Software

In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed object lists rather than raw sensor data from traditional sensors. Instead of processing other raw data separately and then fusing them at the object level, we propose an end-to-end cross-level fusion concept with Transformer, which integrates highly abstract object list information with raw camera images for 3D object detection. Object lists are fed into a Transformer as denoising queries and propagated together with learnable queries through the latter feature aggregation process. Additionally, a deformable Gaussian mask, derived from the positional and size dimensional priors from the object lists, is explicitly integrated into the Transformer decoder. This directs attention toward the target area of interest and accelerates model training convergence. Furthermore, as there is no public dataset containing object lists as a standalone modality, we propose an approach to generate pseudo object lists from ground-truth bounding boxes by simulating state noise and false positives and negatives. As the first work to conduct cross-level fusion, our approach shows substantial performance improvements over the vision-based baseline on the nuScenes dataset. It demonstrates its generalization capability over diverse noise levels of simulated object lists and real detectors.

All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles

CV and Pattern Recognition

Helps self-driving cars see and understand everything.

30 Oct 2025 0

87%

Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection

CV and Pattern Recognition

Helps self-driving cars see better without labels.

28 Aug 2025 0

87%

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

CV and Pattern Recognition

Helps robots see and understand 3D objects better.

2 Dec 2025 1

View PDF Login to Bookmark

Country of Origin

🇩🇪 Germany

Page Count

6 pages

Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection

Helps cars see better by combining different sensors.

Technical Abstract

All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles

Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection