Unlock the Power of Unlabeled Data in Language Driving Model
By: Chaoqun Wang , Jie Yang , Xiaobin Hong and more
Potential Business Impact:
Teaches self-driving cars with less data.
Recent Vision-based Large Language Models~(VisionLLMs) for autonomous driving have seen rapid advancements. However, such promotion is extremely dependent on large-scale high-quality annotated data, which is costly and labor-intensive. To address this issue, we propose unlocking the value of abundant yet unlabeled data to improve the language-driving model in a semi-supervised learning manner. Specifically, we first introduce a series of template-based prompts to extract scene information, generating questions that create pseudo-answers for the unlabeled data based on a model trained with limited labeled data. Next, we propose a Self-Consistency Refinement method to improve the quality of these pseudo-annotations, which are later used for further training. By utilizing a pre-trained VisionLLM (e.g., InternVL), we build a strong Language Driving Model (LDM) for driving scene question-answering, outperforming previous state-of-the-art methods. Extensive experiments on the DriveLM benchmark show that our approach performs well with just 5% labeled data, achieving competitive performance against models trained with full datasets. In particular, our LDM achieves 44.85% performance with limited labeled data, increasing to 54.27% when using unlabeled data, while models trained with full datasets reach 60.68% on the DriveLM benchmark.
Similar Papers
Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving
CV and Pattern Recognition
Makes self-driving cars faster and smarter.
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars see and understand everything.
V3LMA: Visual 3D-enhanced Language Model for Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars see in 3D.