Score: 1

Talking Spell: A Wearable System Enabling Real-Time Anthropomorphic Voice Interaction with Everyday Objects

Published: August 28, 2025 | arXiv ID: 2509.02367v1

By: Xuetong Wang, Ching Christie Pang, Pan Hui

Potential Business Impact:

Makes any object talk and become your friend.

Business Areas:

Speech Recognition Data and Analytics, Software

Virtual assistants (VAs) have become ubiquitous in daily life, integrated into smartphones and smart devices, sparking interest in AI companions that enhance user experiences and foster emotional connections. However, existing companions are often embedded in specific objects-such as glasses, home assistants, or dolls-requiring users to form emotional bonds with unfamiliar items, which can lead to reduced engagement and feelings of detachment. To address this, we introduce Talking Spell, a wearable system that empowers users to imbue any everyday object with speech and anthropomorphic personas through a user-centric radiative network. Leveraging advanced computer vision (e.g., YOLOv11 for object detection), large vision-language models (e.g., QWEN-VL for persona generation), speech-to-text and text-to-speech technologies, Talking Spell guides users through three stages of emotional connection: acquaintance, familiarization, and bonding. We validated our system through a user study involving 12 participants, utilizing Talking Spell to explore four interaction intentions: entertainment, companionship, utility, and creativity. The results demonstrate its effectiveness in fostering meaningful interactions and emotional significance with everyday objects. Our findings indicate that Talking Spell creates engaging and personalized experiences, as demonstrated through various devices, ranging from accessories to essential wearables.

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

Human-Computer Interaction

Makes talking videos sound like you want.

7 Apr 2025 1

86%

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

CV and Pattern Recognition

Makes digital characters move and talk realistically.

14 Dec 2025 0

86%

Look and Talk: Seamless AI Assistant Interaction with Gaze-Triggered Activation

Human-Computer Interaction

Lets you talk to AI just by looking.

12 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇭🇰 Hong Kong

Repos / Data Links

github.com github.com github.com github.com

Page Count

17 pages

Talking Spell: A Wearable System Enabling Real-Time Anthropomorphic Voice Interaction with Everyday Objects

Makes any object talk and become your friend.

Technical Abstract

SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

Look and Talk: Seamless AI Assistant Interaction with Gaze-Triggered Activation