Score: 0

Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization

Published: May 28, 2025 | arXiv ID: 2506.02014v1

By: Wang Mengjie , Zhu Huiping , Li Jian and more

Potential Business Impact:

Helps self-driving cars see and react better.

Business Areas:

Autonomous Vehicles Transportation

With the advancement of autonomous and assisted driving technologies, higher demands are placed on the ability to understand complex driving scenarios. Multimodal general large models have emerged as a solution for this challenge. However, applying these models in vertical domains involves difficulties such as data collection, model training, and deployment optimization. This paper proposes a comprehensive method for optimizing multimodal models in driving scenarios, including cone detection, traffic light recognition, speed limit recommendation, and intersection alerts. The method covers key aspects such as dynamic prompt optimization, dataset construction, model training, and deployment. Specifically, the dynamic prompt optimization adjusts the prompts based on the input image content to focus on objects affecting the ego vehicle, enhancing the model's task-specific focus and judgment capabilities. The dataset is constructed by combining real and synthetic data to create a high-quality and diverse multimodal training dataset, improving the model's generalization in complex driving environments. In model training, advanced techniques like knowledge distillation, dynamic fine-tuning, and quantization are integrated to reduce storage and computational costs while boosting performance. Experimental results show that this systematic optimization method not only significantly improves the model's accuracy in key tasks but also achieves efficient resource utilization, providing strong support for the practical application of driving scenario perception technologies.

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends

CV and Pattern Recognition

Makes cars see and understand everything around them.

21 Apr 2025 0

90%

A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving

CV and Pattern Recognition

Helps self-driving cars understand traffic better.

14 Mar 2025 0

89%

Investigating Traffic Accident Detection Using Multimodal Large Language Models

CV and Pattern Recognition

Finds car crashes from camera pictures.

23 Sep 2025 0

View PDF Login to Bookmark

Page Count

22 pages

Research on Driving Scenario Technology Based on Multimodal Large Lauguage Model Optimization

Helps self-driving cars see and react better.

Technical Abstract

Multimodal Large Language Models for Enhanced Traffic Safety: A Comprehensive Review and Future Trends

A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving

Investigating Traffic Accident Detection Using Multimodal Large Language Models