Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People
By: Ricardo E. Gonzalez Penuela , Ruiying Hu , Sharon Lin and more
Potential Business Impact:
Helps blind people understand the world better.
Blind and Low Vision (BLV) people have adopted AI-powered visual interpretation applications to address their daily needs. While these applications have been helpful, prior work has found that users remain unsatisfied by their frequent errors. Recently, multimodal large language models (MLLMs) have been integrated into visual interpretation applications, and they show promise for more descriptive visual interpretations. However, it is still unknown how this advancement has changed people's use of these applications. To address this gap, we conducted a two-week diary study in which 20 BLV people used an MLLM-enabled visual interpretation application we developed, and we collected 553 entries. In this paper, we report a preliminary analysis of 60 diary entries from 6 participants. We found that participants considered the application's visual interpretations trustworthy (mean 3.75 out of 5) and satisfying (mean 4.15 out of 5). Moreover, participants trusted our application in high-stakes scenarios, such as receiving medical dosage advice. We discuss our plan to complete our analysis to inform the design of future MLLM-enabled visual interpretation systems.
Similar Papers
Guiding Multimodal Large Language Models with Blind and Low Vision People Visual Questions for Proactive Visual Interpretations
CV and Pattern Recognition
Helps blind people get answers they need faster.
Multimodal LLM Augmented Reasoning for Interpretable Visual Perception Analysis
Human-Computer Interaction
Helps computers understand pictures like people do.
LLM impact on BLV programming
Human-Computer Interaction
Helps blind coders use AI tools better.