Score: 0

Facilitating Video Story Interaction with Multi-Agent Collaborative System

Published: May 2, 2025 | arXiv ID: 2505.03807v1

By: Yiwen Zhang , Jianing Hao , Zhan Wang and more

Potential Business Impact:

Makes stories change based on what you want.

Business Areas:

Virtual World Community and Lifestyle, Media and Entertainment, Software

Video story interaction enables viewers to engage with and explore narrative content for personalized experiences. However, existing methods are limited to user selection, specially designed narratives, and lack customization. To address this, we propose an interactive system based on user intent. Our system uses a Vision Language Model (VLM) to enable machines to understand video stories, combining Retrieval-Augmented Generation (RAG) and a Multi-Agent System (MAS) to create evolving characters and scene experiences. It includes three stages: 1) Video story processing, utilizing VLM and prior knowledge to simulate human understanding of stories across three modalities. 2) Multi-space chat, creating growth-oriented characters through MAS interactions based on user queries and story stages. 3) Scene customization, expanding and visualizing various story scenes mentioned in dialogue. Applied to the Harry Potter series, our study shows the system effectively portrays emergent character social behavior and growth, enhancing the interactive experience in the video story world.

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Computation and Language

Makes AI create amazing video stories for kids.

7 Mar 2025 1

88%

Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration

Human-Computer Interaction

Makes stories come alive from your surroundings.

17 Apr 2025 0

88%

InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions

Human-Computer Interaction

Lets computers understand your data questions better.

6 Mar 2025 0

View PDF Login to Bookmark

Page Count

29 pages

Facilitating Video Story Interaction with Multi-Agent Collaborative System

Makes stories change based on what you want.

Technical Abstract

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Object-Driven Narrative in AR: A Scenario-Metaphor Framework with VLM Integration

InterChat: Enhancing Generative Visual Analytics using Multimodal Interactions