Score: 1

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach

Published: July 27, 2025 | arXiv ID: 2507.20356v1

By: Yanming Xiu, Maria Gorlatova

Potential Business Impact:

Stops fake digital things from tricking AR users.

Business Areas:

Augmented Reality Hardware, Software

The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM. It consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect such attacks, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language models (VLMs) with optical character recognition (OCR)-based textual analysis. VIM-Sense achieves an attack detection accuracy of 88.94% on AR-VIM, consistently outperforming vision-only and text-only baselines. The system reaches an average attack detection latency of 7.07 seconds in a simulated video processing framework and 7.17 seconds in a real-world evaluation conducted on a mobile Android AR application.

Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup

Human-Computer Interaction

Protects augmented reality from tricking your eyes.

3 Sep 2025 0

93%

Toward Safe, Trustworthy and Realistic Augmented Reality User Experience

CV and Pattern Recognition

Keeps augmented reality safe from bad virtual things.

31 Jul 2025 0

90%

ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

CV and Pattern Recognition

Keeps virtual objects from blocking real ones.

22 Jan 2025 1

View PDF Login to Bookmark

Country of Origin

🇺🇸 United States

Repos / Data Links

github.com

Page Count

11 pages

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach

Stops fake digital things from tricking AR users.

Technical Abstract

Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup

Toward Safe, Trustworthy and Realistic Augmented Reality User Experience

ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality