MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving
By: Bin Suna , Yaoguang Caob , Yan Wanga and more
Potential Business Impact:
Helps self-driving cars make smarter, safer choices.
End-to-End autonomous driving (E2E-AD) has emerged as a new paradigm, where trajectory planning plays a crucial role. Existing studies mainly follow two directions: trajectory generation oriented, which focuses on producing high-quality trajectories with simple decision mechanisms, and trajectory selection oriented, which performs multi-dimensional evaluation to select the best trajectory yet lacks sufficient generative capability. In this work, we propose MindDrive, a harmonized framework that integrates high-quality trajectory generation with comprehensive decision reasoning. It establishes a structured reasoning paradigm of "context simulation - candidate generation - multi-objective trade-off". In particular, the proposed Future-aware Trajectory Generator (FaTG), based on a World Action Model (WaM), performs ego-conditioned "what-if" simulations to predict potential future scenes and generate foresighted trajectory candidates. Building upon this, the VLM-oriented Evaluator (VLoE) leverages the reasoning capability of a large vision-language model to conduct multi-objective evaluations across safety, comfort, and efficiency dimensions, leading to reasoned and human-aligned decision making. Extensive experiments on the NAVSIM-v1 and NAVSIM-v2 benchmarks demonstrate that MindDrive achieves state-of-the-art performance across multi-dimensional driving metrics, significantly enhancing safety, compliance, and generalization. This work provides a promising path toward interpretable and cognitively guided autonomous driving.
Similar Papers
DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving
Robotics
Makes self-driving cars safer and smarter.
ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars imagine and plan better.
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
CV and Pattern Recognition
Helps self-driving cars understand spoken directions better.