Unified Attention Modeling for Efficient Free-Viewing and Visual Search via Shared Representations
By: Fatma Youssef Mohammed, Kostas Alexis
Potential Business Impact:
Lets computers learn where to look faster.
Computational human attention modeling in free-viewing and task-specific settings is often studied separately, with limited exploration of whether a common representation exists between them. This work investigates this question and proposes a neural network architecture that builds upon the Human Attention transformer (HAT) to test the hypothesis. Our results demonstrate that free-viewing and visual search can efficiently share a common representation, allowing a model trained in free-viewing attention to transfer its knowledge to task-driven visual search with a performance drop of only 3.86% in the predicted fixation scanpaths, measured by the semantic sequence score (SemSS) metric which reflects the similarity between predicted and human scanpaths. This transfer reduces computational costs by 92.29% in terms of GFLOPs and 31.23% in terms of trainable parameters.
Similar Papers
Visual Attention Graph
CV and Pattern Recognition
Shows how brains look at things to understand them.
Learning to Look: Cognitive Attention Alignment with Vision-Language Models
CV and Pattern Recognition
Teaches computers to see like humans.
A Neural Network Model of Spatial and Feature-Based Attention
CV and Pattern Recognition
Helps computers focus on important things.