BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
By: Katharina Winter, Mark Azer, Fabian B. Flohr
Potential Business Impact:
Helps self-driving cars understand and follow directions.
Autonomous driving has the potential to set the stage for more efficient future mobility, requiring the research domain to establish trust through safe, reliable and transparent driving. Large Language Models (LLMs) possess reasoning capabilities and natural language understanding, presenting the potential to serve as generalized decision-makers for ego-motion planning that can interact with humans and navigate environments designed for human drivers. While this research avenue is promising, current autonomous driving approaches are challenged by combining 3D spatial grounding and the reasoning and language capabilities of LLMs. We introduce BEVDriver, an LLM-based model for end-to-end closed-loop driving in CARLA that utilizes latent BEV features as perception input. BEVDriver includes a BEV encoder to efficiently process multi-view images and 3D LiDAR point clouds. Within a common latent space, the BEV features are propagated through a Q-Former to align with natural language instructions and passed to the LLM that predicts and plans precise future trajectories while considering navigation instructions and critical scenarios. On the LangAuto benchmark, our model reaches up to 18.9% higher performance on the Driving Score compared to SoTA methods.
Similar Papers
BEV-LLM: Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars describe what they see.
X-Driver: Explainable Autonomous Driving with Vision-Language Models
Robotics
Makes self-driving cars better at making decisions.
Vehicle-to-Infrastructure Collaborative Spatial Perception via Multimodal Large Language Models
Machine Learning (CS)
Helps cars talk to each other better, even in bad weather.