DetAny4D: Detect Anything 4D Temporally in a Streaming RGB Video
By: Jiawei Hou , Shenghao Zhang , Can Wang and more
Potential Business Impact:
Helps self-driving cars see moving objects better.
Reliable 4D object detection, which refers to 3D object detection in streaming video, is crucial for perceiving and understanding the real world. Existing open-set 4D object detection methods typically make predictions on a frame-by-frame basis without modeling temporal consistency, or rely on complex multi-stage pipelines that are prone to error propagation across cascaded stages. Progress in this area has been hindered by the lack of large-scale datasets that capture continuous reliable 3D bounding box (b-box) annotations. To overcome these challenges, we first introduce DA4D, a large-scale 4D detection dataset containing over 280k sequences with high-quality b-box annotations collected under diverse conditions. Building on DA4D, we propose DetAny4D, an open-set end-to-end framework that predicts 3D b-boxes directly from sequential inputs. DetAny4D fuses multi-modal features from pre-trained foundational models and designs a geometry-aware spatiotemporal decoder to effectively capture both spatial and temporal dynamics. Furthermore, it adopts a multi-task learning architecture coupled with a dedicated training strategy to maintain global consistency across sequences of varying lengths. Extensive experiments show that DetAny4D achieves competitive detection accuracy and significantly improves temporal stability, effectively addressing long-standing issues of jitter and inconsistency in 4D object detection. Data and code will be released upon acceptance.
Similar Papers
Any4D: Unified Feed-Forward Metric 4D Reconstruction
CV and Pattern Recognition
Makes videos show moving 3D objects accurately.
Inferring Compositional 4D Scenes without Ever Seeing One
CV and Pattern Recognition
Builds 3D worlds from videos, showing moving objects.
Detect Anything 3D in the Wild
CV and Pattern Recognition
Finds new objects in 3D from one camera.