Score: 0

Convolutions Need Registers Too: HVS-Inspired Dynamic Attention for Video Quality Assessment

Published: January 16, 2026 | arXiv ID: 2601.11045v1

By: Mayesha Maliha R. Mithila, Mylene C. Q. Farias

Potential Business Impact:

Makes videos look better by watching them closely.

Business Areas:
Image Recognition Data and Analytics, Software

No-reference video quality assessment (NR-VQA) estimates perceptual quality without a reference video, which is often challenging. While recent techniques leverage saliency or transformer attention, they merely address global context of the video signal by using static maps as auxiliary inputs rather than embedding context fundamentally within feature extraction of the video sequence. We present Dynamic Attention with Global Registers for Video Quality Assessment (DAGR-VQA), the first framework integrating register-token directly into a convolutional backbone for spatio-temporal, dynamic saliency prediction. By embedding learnable register tokens as global context carriers, our model enables dynamic, HVS-inspired attention, producing temporally adaptive saliency maps that track salient regions over time without explicit motion estimation. Our model integrates dynamic saliency maps with RGB inputs, capturing spatial data and analyzing it through a temporal transformer to deliver a perceptually consistent video quality assessment. Comprehensive tests conducted on the LSVQ, KonVid-1k, LIVE-VQC, and YouTube-UGC datasets show that the performance is highly competitive, surpassing the majority of top baselines. Research on ablation studies demonstrates that the integration of register tokens promotes the development of stable and temporally consistent attention mechanisms. Achieving an efficiency of 387.7 FPS at 1080p, DAGR-VQA demonstrates computational performance suitable for real-time applications like multimedia streaming systems.

Country of Origin
πŸ‡ΊπŸ‡Έ United States

Page Count
12 pages

Category
Electrical Engineering and Systems Science:
Image and Video Processing