Score: 0

A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration

Published: May 8, 2025 | arXiv ID: 2505.04885v1

By: Shaja Arul Selvamani, Nia D'Souza Ganapathy

Potential Business Impact:

Makes audiobooks sound like real life.

Business Areas:

Audiobooks Media and Entertainment, Music and Audio

This research introduces an innovative AI-driven multi-agent framework specifically designed for creating immersive audiobooks. Leveraging neural text-to-speech synthesis with FastSpeech 2 and VALL-E for expressive narration and character-specific voices, the framework employs advanced language models to automatically interpret textual narratives and generate realistic spatial audio effects. These sound effects are dynamically synchronized with the storyline through sophisticated temporal integration methods, including Dynamic Time Warping (DTW) and recurrent neural networks (RNNs). Diffusion-based generative models combined with higher-order ambisonics (HOA) and scattering delay networks (SDN) enable highly realistic 3D soundscapes, substantially enhancing listener immersion and narrative realism. This technology significantly advances audiobook applications, providing richer experiences for educational content, storytelling platforms, and accessibility solutions for visually impaired audiences. Future work will address personalization, ethical management of synthesized voices, and integration with multi-sensory platforms.

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Computation and Language

Makes AI create amazing video stories for kids.

7 Mar 2025 1

88%

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation

Sound

Makes audiobooks sound like real people talking.

15 Apr 2025 1

87%

Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation

Artificial Intelligence

Helps robots find sounds in noisy places.

21 Sep 2025 0

View PDF Login to Bookmark

Page Count

14 pages

A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration

Makes audiobooks sound like real life.

Technical Abstract

MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Immersive Audiobook Generation

Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation