Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Video
By: Alexander Htet Kyaw, Lenin Ravindranath Sivalingam
Potential Business Impact:
Creates stories with pictures, sound, and video.
We present a node-based storytelling system for multimodal content generation. The system represents stories as graphs of nodes that can be expanded, edited, and iteratively refined through direct user edits and natural-language prompts. Each node can integrate text, images, audio, and video, allowing creators to compose multimodal narratives. A task selection agent routes between specialized generative tasks that handle story generation, node structure reasoning, node diagram formatting, and context generation. The interface supports targeted editing of individual nodes, automatic branching for parallel storylines, and node-based iterative refinement. Our results demonstrate that node-based editing supports control over narrative structure and iterative generation of text, images, audio, and video. We report quantitative outcomes on automatic story outline generation and qualitative observations of editing workflows. Finally, we discuss current limitations such as scalability to longer narratives and consistency across multiple nodes, and outline future work toward human-in-the-loop and user-centered creative AI tools.
Similar Papers
Node-Based Editing for Multimodal Generation of Text, Audio, Image, and Vide
Human-Computer Interaction
Creates stories with pictures, sound, and video.
Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media
Artificial Intelligence
Lets you edit long videos easily with words.
Natural Language Interaction for Editing Visual Knowledge Graphs
Human-Computer Interaction
Lets you change computer maps by talking.