Score: 0

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

Published: November 16, 2025 | arXiv ID: 2511.12708v1

By: Kaiser Hamid , Can Cui , Khandakar Ashrafi Akbar and more

Potential Business Impact:

Teaches cars where drivers look with less data.

Business Areas:

Autonomous Vehicles Transportation

Understanding where drivers look and why they shift their attention is essential for autonomous systems that read human intent and justify their actions. Most existing models rely on large-scale gaze datasets to learn these patterns; however, such datasets are labor-intensive to collect and time-consuming to curate. We present FSDAM (Few-Shot Driver Attention Modeling), a framework that achieves joint attention prediction and caption generation with approximately 100 annotated examples, two orders of magnitude fewer than existing approaches. Our approach introduces a dual-pathway architecture where separate modules handle spatial prediction and caption generation while maintaining semantic consistency through cross-modal alignment. Despite minimal supervision, FSDAM achieves competitive performance on attention prediction, generates coherent, and context-aware explanations. The model demonstrates robust zero-shot generalization across multiple driving benchmarks. This work shows that effective attention-conditioned generation is achievable with limited supervision, opening new possibilities for practical deployment of explainable driver attention systems in data-constrained scenarios.

VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments

CV and Pattern Recognition

Predicts where drivers look using words.

7 Aug 2025 0

88%

PQ-DAF: Pose-driven Quality-controlled Data Augmentation for Data-scarce Driver Distraction Detection

CV and Pattern Recognition

Helps cars tell if drivers are distracted.

14 Aug 2025 0

88%

Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System

CV and Pattern Recognition

Helps cars see if drivers are looking away.

15 Nov 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Page Count

15 pages

FSDAM: Few-Shot Driving Attention Modeling via Vision-Language Coupling

Teaches cars where drivers look with less data.

Technical Abstract

VISTA: Vision-Language Imitation of Situational Thinking and Attention for Human-Like Driver Focus in Dynamic Environments

PQ-DAF: Pose-driven Quality-controlled Data Augmentation for Data-scarce Driver Distraction Detection

Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System