Score: 0

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

Published: September 28, 2025 | arXiv ID: 2509.23852v1

By: Yiheng Huang , Junran Peng , Silei Shen and more

Potential Business Impact:

Robots can now point and talk like people.

Business Areas:

Motion Capture Media and Entertainment, Video

The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

Graphics

Robots can now point and gesture like people.

28 Sep 2025 0

96%

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

Graphics

Robots can now point and talk like people.

28 Sep 2025 0

90%

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents

Robotics

Robots learn to point and talk naturally.

15 Sep 2025 2

View PDF Login to Bookmark

Country of Origin

🇨🇳 China

Page Count

17 pages

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

Robots can now point and talk like people.

Technical Abstract

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where

Learning to Generate Pointing Gestures in Situated Embodied Conversational Agents