Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
By: Antonio Guillen-Perez
Potential Business Impact:
Finds rare car dangers without sending data away.
The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of "Long-Tail" training data. While fleets collect petabytes of video logs, identifying rare safety-critical events (e.g., erratic jaywalking, construction diversions) remains a manual, cost-prohibitive process. Existing solutions rely on coarse metadata search, which lacks precision, or cloud-based VLMs, which are privacy-invasive and expensive. We introduce Semantic-Drive, a local-first, neuro-symbolic framework for semantic data mining. Our approach decouples perception into two stages: (1) Symbolic Grounding via a real-time open-vocabulary detector (YOLOE) to anchor attention, and (2) Cognitive Analysis via a Reasoning VLM that performs forensic scene analysis. To mitigate hallucination, we implement a "System 2" inference-time alignment strategy, utilizing a multi-model "Judge-Scout" consensus mechanism. Benchmarked on the nuScenes dataset against the Waymo Open Dataset (WOD-E2E) taxonomy, Semantic-Drive achieves a Recall of 0.966 (vs. 0.475 for CLIP) and reduces Risk Assessment Error by 40\% compared to single models. The system runs entirely on consumer hardware (NVIDIA RTX 3090), offering a privacy-preserving alternative to the cloud.
Similar Papers
VLMs Guided Interpretable Decision Making for Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars make safer, clearer choices.
VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
CV and Pattern Recognition
Helps self-driving cars see new things safely.
SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
CV and Pattern Recognition
Helps self-driving cars understand where things are.