Through Their Eyes: User Perceptions on Sensitive Attribute Inference of Social Media Videos by Visual Language Models
By: Shuning Zhang , Gengrui Zhang , Yibo Meng and more
Potential Business Impact:
AI can guess private things about you from photos.
The rapid advancement of Visual Language Models (VLMs) has enabled sophisticated analysis of visual content, leading to concerns about the inference of sensitive user attributes and subsequent privacy risks. While technical capabilities of VLMs are increasingly studied, users' understanding, perceptions, and reactions to these inferences remain less explored, especially concerning videos uploaded on the social media. This paper addresses this gap through a semi-structured interview (N=17), investigating user perspectives on VLM-driven sensitive attribute inference from their visual data. Findings reveal that users perceive VLMs as capable of inferring a range of attributes, including location, demographics, and socioeconomic indicators, often with unsettling accuracy. Key concerns include unauthorized identification, misuse of personal information, pervasive surveillance, and harm from inaccurate inferences. Participants reported employing various mitigation strategies, though with skepticism about their ultimate effectiveness against advanced AI. Users also articulate clear expectations for platforms and regulators, emphasizing the need for enhanced transparency, user control, and proactive privacy safeguards. These insights are crucial for guiding the development of responsible AI systems, effective privacy-enhancing technologies, and informed policymaking that aligns with user expectations and societal values.
Similar Papers
The Pervasive Blind Spot: Benchmarking VLM Inference Risks on Everyday Personal Videos
Human-Computer Interaction
Makes AI guess private details from your videos.
Trust in Vision-Language Models: Insights from a Participatory User Workshop
Human-Computer Interaction
Helps people know when to trust AI image and video descriptions.
Zero-shot image privacy classification with Vision-Language Models
CV and Pattern Recognition
Makes computers better at guessing private pictures.