WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
By: Eun Chang , Zhuangqun Huang , Yiwei Liao and more
Potential Business Impact:
Tests smart glasses AI to answer questions about what you see.
We introduce WearVQA, the first benchmark specifically designed to evaluate the Visual Question Answering (VQA) capabilities of multi-model AI assistant on wearable devices like smart glasses. Unlike prior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique challenges of ego-centric interaction-where visual inputs may be occluded, poorly lit, unzoomed, or blurry, and questions are grounded in realistic wearable use cases. The benchmark comprises 2,520 carefully curated image-question-answer triplets, spanning 7 diverse image domains including both text-centric and general scenes, 10 cognitive task types ranging from basic recognition to various forms of reasoning, and 6 common wearables-specific image quality issues. All questions are designed to be answerable using only the visual input and common senses. WearVQA is paired with a rigorous LLM-as-a-judge evaluation framework with 96% labeling accuracy. Open-source and proprietary multi-model LLMs achieved a QA accuracy as low as 24-52% on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks. These observations position WearVQA as a comprehensive and challenging benchmark for guiding technical advancement towards robust, real-world multi-model wearables AI systems.
Similar Papers
VQ-VA World: Towards High-Quality Visual Question-Visual Answering
CV and Pattern Recognition
Makes computers draw pictures from questions.
Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
CV and Pattern Recognition
Helps smart glasses guess what you want.
VQArt-Bench: A semantically rich VQA Benchmark for Art and Cultural Heritage
CV and Pattern Recognition
Tests if computers truly understand art.