ShelfOcc: Native 3D Supervision beyond LiDAR for Vision-Based Occupancy Estimation
By: Simon Boeder , Fabian Gigengack , Simon Roesler and more
Potential Business Impact:
Lets cars see in 3D without special sensors.
Recent progress in self- and weakly supervised occupancy estimation has largely relied on 2D projection or rendering-based supervision, which suffers from geometric inconsistencies and severe depth bleeding. We thus introduce ShelfOcc, a vision-only method that overcomes these limitations without relying on LiDAR. ShelfOcc brings supervision into native 3D space by generating metrically consistent semantic voxel labels from video, enabling true 3D supervision without any additional sensors or manual 3D annotations. While recent vision-based 3D geometry foundation models provide a promising source of prior knowledge, they do not work out of the box as a prediction due to sparse or noisy and inconsistent geometry, especially in dynamic driving scenes. Our method introduces a dedicated framework that mitigates these issues by filtering and accumulating static geometry consistently across frames, handling dynamic content and propagating semantic information into a stable voxel representation. This data-centric shift in supervision for weakly/shelf-supervised occupancy estimation allows the use of essentially any SOTA occupancy model architecture without relying on LiDAR data. We argue that such high-quality supervision is essential for robust occupancy learning and constitutes an important complementary avenue to architectural innovation. On the Occ3D-nuScenes benchmark, ShelfOcc substantially outperforms all previous weakly/shelf-supervised methods (up to a 34% relative improvement), establishing a new data-driven direction for LiDAR-free 3D scene understanding.
Similar Papers
QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy
CV and Pattern Recognition
Teaches cars to see and understand 3D worlds.
MinkOcc: Towards real-time label-efficient semantic occupancy prediction
CV and Pattern Recognition
Teaches cars to see without lots of human help.
ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding
CV and Pattern Recognition
Lets computers build 3D worlds from pictures.