Score: 2

EgoBlind: Towards Egocentric Visual Assistance for the Blind

Published: March 11, 2025 | arXiv ID: 2503.08221v2

By: Junbin Xiao , Nanxin Huang , Hao Qiu and more

Potential Business Impact:

Helps blind people use AI to see.

Business Areas:
Visual Search Internet Services

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs). EgoBlind comprises 1,392 videos that record the daily lives of real blind users from a first-person perspective. It also features 5,311 questions directly posed or generated and verified by blind individuals to reflect their in-situation needs for visual assistance under various scenarios. We provide each question with an average of 3 reference answers to alleviate subjective evaluation. Using EgoBlind, we comprehensively evaluate 16 advanced MLLMs and find that all models struggle, with the best performers achieving accuracy near 60\%, far behind human performance of 87.4\%. To guide future advancements, we identify and summarize major limitations of existing MLLMs in egocentric visual assistance for the blind and explore heuristic solutions for improvement. With these efforts, we hope EgoBlind can serve as a valuable foundation for developing more effective AI assistants to enhance the independence of the blind individuals' lives. Data and evaluation code are available at https://github.com/doc-doc/EgoBlind.

Country of Origin
πŸ‡¨πŸ‡³ πŸ‡ΈπŸ‡¬ Singapore, China

Repos / Data Links

Page Count
28 pages

Category
Computer Science:
CV and Pattern Recognition