Enhancing Steering Estimation with Semantic-Aware GNNs
By: Fouad Makiyeh , Huy-Dung Nguyen , Patrick Chareyre and more
Potential Business Impact:
Cars steer better using 3D pictures, not just 2D.
Steering estimation is a critical task in autonomous driving, traditionally relying on 2D image-based models. In this work, we explore the advantages of incorporating 3D spatial information through hybrid architectures that combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling, using LiDAR-based point clouds as input. We systematically evaluate four hybrid 3D models, all of which outperform the 2D-only baseline, with the Graph Neural Network (GNN) - RNN model yielding the best results. To reduce reliance on LiDAR, we leverage a pretrained unified model to estimate depth from monocular images, reconstructing pseudo-3D point clouds. We then adapt the GNN-RNN model, originally designed for LiDAR-based point clouds, to work with these pseudo-3D representations, achieving comparable or even superior performance compared to the LiDAR-based model. Additionally, the unified model provides semantic labels for each point, enabling a more structured scene representation. To further optimize graph construction, we introduce an efficient connectivity strategy where connections are predominantly formed between points of the same semantic class, with only 20\% of inter-class connections retained. This targeted approach reduces graph complexity and computational cost while preserving critical spatial relationships. Finally, we validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models. Our findings highlight the advantages of 3D spatial information and efficient graph construction for steering estimation, while maintaining the cost-effectiveness of monocular images and avoiding the expense of LiDAR-based systems.
Similar Papers
Image Segmentation: Inducing graph-based learning
CV and Pattern Recognition
Helps computers see and understand images better.
Leveraging Semantic Graphs for Efficient and Robust LiDAR SLAM
Robotics
Helps robots understand where they are and what's around.
Project-and-Fuse: Improving RGB-D Semantic Segmentation via Graph Convolution Networks
CV and Pattern Recognition
Helps computers understand 3D scenes better.