Score: 1

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Published: November 20, 2025 | arXiv ID: 2511.16245v1

By: Qing Chang, Zhiming Hu

Potential Business Impact:

Explains what people are looking at and doing.

Business Areas:

Natural Language Processing Artificial Intelligence, Data and Analytics, Software

Comprehensively interpreting human behavior is a core challenge in human-aware artificial intelligence. However, prior works typically focused on body behavior, neglecting the crucial role of eye gaze and its synergy with body motion. We present GazeInterpreter - a novel large language model-based (LLM-based) approach that parses eye gaze data to generate eye-body-coordinated narrations. Specifically, our method features 1) a symbolic gaze parser that translates raw gaze signals into symbolic gaze events; 2) a hierarchical structure that first uses an LLM to generate eye gaze narration at semantic level and then integrates gaze with body motion within the same observation window to produce integrated narration; and 3) a self-correcting loop that iteratively refines the modality match, temporal coherence, and completeness of the integrated narration. This hierarchical and iterative processing can effectively align physical values and semantic text in the temporal and spatial domains. We validated the effectiveness of our eye-body-coordinated narrations on the text-driven motion generation task in the large-scale Nymeria benchmark. Moreover, we report significant performance improvements for the sample downstream tasks of action anticipation and behavior summarization. Taken together, these results reveal the significant potential of parsing eye gaze to interpret human behavior and open up a new direction for human behavior understanding.

Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

Human-Computer Interaction

Helps computers understand how people look at things.

24 Jul 2025 0

89%

SemanticScanpath: Combining Gaze and Speech for Situated Human-Robot Interaction Using LLMs

Human-Computer Interaction

Robots understand what you mean by looking.

19 Mar 2025 1

88%

MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution

Robotics

Robots understand what you want by watching your eyes.

17 Mar 2025 0

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

9 pages

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Explains what people are looking at and doing.

Technical Abstract

Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

SemanticScanpath: Combining Gaze and Speech for Situated Human-Robot Interaction Using LLMs

MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution