Score: 1

Guided Reality: Generating Visually-Enriched AR Task Guidance with LLMs and Vision Models

Published: August 5, 2025 | arXiv ID: 2508.03547v1

By: Ada Yi Zhao , Aditya Gunturu , Ellen Yi-Luen Do and more

Potential Business Impact:

Shows you how to build things with AR.

Large language models (LLMs) have enabled the automatic generation of step-by-step augmented reality (AR) instructions for a wide range of physical tasks. However, existing LLM-based AR guidance often lacks rich visual augmentations to effectively embed instructions into spatial context for a better user understanding. We present Guided Reality, a fully automated AR system that generates embedded and dynamic visual guidance based on step-by-step instructions. Our system integrates LLMs and vision models to: 1) generate multi-step instructions from user queries, 2) identify appropriate types of visual guidance, 3) extract spatial information about key interaction points in the real world, and 4) embed visual guidance in physical space to support task execution. Drawing from a corpus of user manuals, we define five categories of visual guidance and propose an identification strategy based on the current step. We evaluate the system through a user study (N=16), completing real-world tasks and exploring the system in the wild. Additionally, four instructors shared insights on how Guided Reality could be integrated into their training workflows.

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Human-Computer Interaction

Helps AR/VR assistants understand what you're doing.

1 Nov 2025 0

92%

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Human-Computer Interaction

Helps AR/VR assistants understand what you're doing.

1 Nov 2025 0

92%

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Human-Computer Interaction

Smart glasses help workers by understanding their actions.

1 Nov 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇦 🇺🇸 Canada, United States

Page Count

15 pages

Guided Reality: Generating Visually-Enriched AR Task Guidance with LLMs and Vision Models

Shows you how to build things with AR.

Technical Abstract

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality

Teaching LLMs to See and Guide: Context-Aware Real-Time Assistance in Augmented Reality