Talking Spell: A Wearable System Enabling Real-Time Anthropomorphic Voice Interaction with Everyday Objects
By: Xuetong Wang, Ching Christie Pang, Pan Hui
Potential Business Impact:
Makes any object talk and become your friend.
Virtual assistants (VAs) have become ubiquitous in daily life, integrated into smartphones and smart devices, sparking interest in AI companions that enhance user experiences and foster emotional connections. However, existing companions are often embedded in specific objects-such as glasses, home assistants, or dolls-requiring users to form emotional bonds with unfamiliar items, which can lead to reduced engagement and feelings of detachment. To address this, we introduce Talking Spell, a wearable system that empowers users to imbue any everyday object with speech and anthropomorphic personas through a user-centric radiative network. Leveraging advanced computer vision (e.g., YOLOv11 for object detection), large vision-language models (e.g., QWEN-VL for persona generation), speech-to-text and text-to-speech technologies, Talking Spell guides users through three stages of emotional connection: acquaintance, familiarization, and bonding. We validated our system through a user study involving 12 participants, utilizing Talking Spell to explore four interaction intentions: entertainment, companionship, utility, and creativity. The results demonstrate its effectiveness in fostering meaningful interactions and emotional significance with everyday objects. Our findings indicate that Talking Spell creates engaging and personalized experiences, as demonstrated through various devices, ranging from accessories to essential wearables.
Similar Papers
SpeakEasy: Enhancing Text-to-Speech Interactions for Expressive Content Creation
Human-Computer Interaction
Makes talking videos sound like you want.
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
CV and Pattern Recognition
Makes digital characters move and talk realistically.
Look and Talk: Seamless AI Assistant Interaction with Gaze-Triggered Activation
Human-Computer Interaction
Lets you talk to AI just by looking.