Score: 0

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Published: October 21, 2025 | arXiv ID: 2510.18502v1

By: Wei-Chia Chang, Yan-Ann Chen

Potential Business Impact:

Identifies car makes and models without retraining.

Business Areas:

Image Recognition Data and Analytics, Software

Vehicle make and model recognition (VMMR) is an important task in intelligent transportation systems, but existing approaches struggle to adapt to newly released models. Contrastive Language-Image Pretraining (CLIP) provides strong visual-text alignment, yet its fixed pretrained weights limit performance without costly image-specific finetuning. We propose a pipeline that integrates vision language models (VLMs) with Retrieval-Augmented Generation (RAG) to support zero-shot recognition through text-based reasoning. A VLM converts vehicle images into descriptive attributes, which are compared against a database of textual features. Relevant entries are retrieved and combined with the description to form a prompt, and a language model (LM) infers the make and model. This design avoids large-scale retraining and enables rapid updates by adding textual descriptions of new vehicles. Experiments show that the proposed method improves recognition by nearly 20% over the CLIP baseline, demonstrating the potential of RAG-enhanced LM reasoning for scalable VMMR in smart-city applications.

SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition

CV and Pattern Recognition

Helps cars identify any road sign, even new ones.

14 Dec 2025 0

90%

Multimodal RAG Enhanced Visual Description

Machine Learning (CS)

Helps computers describe pictures better and faster.

6 Aug 2025 0

90%

Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval

CV and Pattern Recognition

Lets computers watch long videos and understand them.

9 Dec 2025 0

View PDF Login to Bookmark

Country of Origin

🇹🇼 Taiwan, Province of China

Page Count

6 pages

Zero-Shot Vehicle Model Recognition via Text-Based Retrieval-Augmented Generation

Identifies car makes and models without retraining.

Technical Abstract

SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition

Multimodal RAG Enhanced Visual Description

Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval