Score: 0

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Published: November 25, 2025 | arXiv ID: 2511.19914v1

By: Dapeng Zhang , Fei Shen , Rui Zhao and more

Potential Business Impact:

Teaches self-driving cars to handle tricky situations.

Business Areas:

Autonomous Vehicles Transportation

Autonomous driving represents a prominent application of artificial intelligence. Recent approaches have shifted from focusing solely on common scenarios to addressing complex, long-tail situations such as subtle human behaviors, traffic accidents, and non-compliant driving patterns. Given the demonstrated capabilities of large language models (LLMs) in understanding visual and natural language inputs and following instructions, recent methods have integrated LLMs into autonomous driving systems to enhance reasoning, interpretability, and performance across diverse scenarios. However, existing methods typically rely either on real-world data, which is suitable for industrial deployment, or on simulation data tailored to rare or hard case scenarios. Few approaches effectively integrate the complementary advantages of both data sources. To address this limitation, we propose a novel VLM-guided, end-to-end adversarial transfer framework for autonomous driving that transfers long-tail handling capabilities from simulation to real-world deployment, named CoC-VLA. The framework comprises a teacher VLM model, a student VLM model, and a discriminator. Both the teacher and student VLM models utilize a shared base architecture, termed the Chain-of-Causality Visual-Language Model (CoC VLM), which integrates temporal information via an end-to-end text adapter. This architecture supports chain-of-thought reasoning to infer complex driving logic. The teacher and student VLM models are pre-trained separately on simulated and real-world datasets. The discriminator is trained adversarially to facilitate the transfer of long-tail handling capabilities from simulated to real-world environments by the student VLM model, using a novel backpropagation strategy.

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

CV and Pattern Recognition

Helps self-driving cars think step-by-step to drive safely.

27 Nov 2025 1

92%

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

CV and Pattern Recognition

Helps self-driving cars drive smarter and faster.

25 Nov 2025 1

92%

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning

CV and Pattern Recognition

Makes self-driving cars better at tricky situations.

4 Dec 2025 0

View PDF Login to Bookmark

Page Count

22 pages

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Teaches self-driving cars to handle tricky situations.

Technical Abstract

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning