Score: 1

Unlock the Power of Unlabeled Data in Language Driving Model

Published: March 13, 2025 | arXiv ID: 2503.10586v2

By: Chaoqun Wang , Jie Yang , Xiaobin Hong and more

Potential Business Impact:

Teaches self-driving cars with less data.

Business Areas:
Autonomous Vehicles Transportation

Recent Vision-based Large Language Models~(VisionLLMs) for autonomous driving have seen rapid advancements. However, such promotion is extremely dependent on large-scale high-quality annotated data, which is costly and labor-intensive. To address this issue, we propose unlocking the value of abundant yet unlabeled data to improve the language-driving model in a semi-supervised learning manner. Specifically, we first introduce a series of template-based prompts to extract scene information, generating questions that create pseudo-answers for the unlabeled data based on a model trained with limited labeled data. Next, we propose a Self-Consistency Refinement method to improve the quality of these pseudo-annotations, which are later used for further training. By utilizing a pre-trained VisionLLM (e.g., InternVL), we build a strong Language Driving Model (LDM) for driving scene question-answering, outperforming previous state-of-the-art methods. Extensive experiments on the DriveLM benchmark show that our approach performs well with just 5% labeled data, achieving competitive performance against models trained with full datasets. In particular, our LDM achieves 44.85% performance with limited labeled data, increasing to 54.27% when using unlabeled data, while models trained with full datasets reach 60.68% on the DriveLM benchmark.

Country of Origin
🇭🇰 🇨🇳 China, Hong Kong

Page Count
7 pages

Category
Computer Science:
CV and Pattern Recognition