Score: 1

X-Driver: Explainable Autonomous Driving with Vision-Language Models

Published: May 8, 2025 | arXiv ID: 2505.05098v2

By: Wei Liu , Jiyuan Zhang , Binxiong Zheng and more

Potential Business Impact:

Makes self-driving cars better at making decisions.

Business Areas:

Autonomous Vehicles Transportation

End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a unified multi-modal large language models(MLLMs) framework designed for closed-loop autonomous driving, leveraging Chain-of-Thought(CoT) and autoregressive modeling to enhance perception and decision-making. We validate X-Driver across multiple autonomous driving tasks using public benchmarks in CARLA simulation environment, including Bench2Drive[6]. Our experimental results demonstrate superior closed-loop performance, surpassing the current state-of-the-art(SOTA) while improving the interpretability of driving decisions. These findings underscore the importance of structured reasoning in end-to-end driving and establish X-Driver as a strong baseline for future research in closed-loop autonomous driving.

BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving

Robotics

Helps self-driving cars understand and follow directions.

5 Mar 2025 2

90%

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

CV and Pattern Recognition

Car sees, understands, and drives safely.

12 Mar 2025 1

90%

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Robotics

Teaches self-driving cars to handle tricky situations.

25 Nov 2025 0

View PDF Login to Bookmark

Page Count

8 pages

X-Driver: Explainable Autonomous Driving with Vision-Language Models

Makes self-driving cars better at making decisions.

Technical Abstract

BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model