SliceSemOcc: Vertical Slice Based Multimodal 3D Semantic Occupancy Representation
By: Han Huang , Han Sun , Ningzhong Liu and more
Potential Business Impact:
Helps self-driving cars see in 3D better.
Driven by autonomous driving's demands for precise 3D perception, 3D semantic occupancy prediction has become a pivotal research topic. Unlike bird's-eye-view (BEV) methods, which restrict scene representation to a 2D plane, occupancy prediction leverages a complete 3D voxel grid to model spatial structures in all dimensions, thereby capturing semantic variations along the vertical axis. However, most existing approaches overlook height-axis information when processing voxel features. And conventional SENet-style channel attention assigns uniform weight across all height layers, limiting their ability to emphasize features at different heights. To address these limitations, we propose SliceSemOcc, a novel vertical slice based multimodal framework for 3D semantic occupancy representation. Specifically, we extract voxel features along the height-axis using both global and local vertical slices. Then, a global local fusion module adaptively reconciles fine-grained spatial details with holistic contextual information. Furthermore, we propose the SEAttention3D module, which preserves height-wise resolution through average pooling and assigns dynamic channel attention weights to each height layer. Extensive experiments on nuScenes-SurroundOcc and nuScenes-OpenOccupancy datasets verify that our method significantly enhances mean IoU, achieving especially pronounced gains on most small-object categories. Detailed ablation studies further validate the effectiveness of the proposed SliceSemOcc framework.
Similar Papers
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
CV and Pattern Recognition
Helps robots understand 3D spaces from pictures.
QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy
CV and Pattern Recognition
Teaches cars to see and understand 3D worlds.
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
CV and Pattern Recognition
Lets self-driving cars spot exact 3D object shapes