Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
By: Donggeun Lim , Jinseok Bae , Inwoo Hwang and more
Potential Business Impact:
Makes computer characters move together in stories.
In this work, we propose a framework that creates a lively virtual dynamic scene with contextual motions of multiple humans. Generating multi-human contextual motion requires holistic reasoning over dynamic relationships among human-human and human-scene interactions. We adapt the power of a large language model (LLM) to digest the contextual complexity within textual input and convert the task into tangible subproblems such that we can generate multi-agent behavior beyond the scale that was not considered before. Specifically, our event generator formulates the temporal progression of a dynamic scene into a sequence of small events. Each event calls for a well-defined motion involving relevant characters and objects. Next, we synthesize the motions of characters at positions sampled based on spatial guidance. We employ a high-level module to deliver scalable yet comprehensive context, translating events into relative descriptions that enable the retrieval of precise coordinates. As the first to address this problem at scale and with diversity, we offer a benchmark to assess diverse aspects of contextual reasoning. Benchmark results and user studies show that our framework effectively captures scene context with high scalability. The code and benchmark, along with result videos, are available at our project page: https://rms0329.github.io/Event-Driven-Storytelling/.
Similar Papers
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras
CV and Pattern Recognition
Lets cars understand spoken commands about surroundings.
Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models
CV and Pattern Recognition
Helps robots understand what many people will do.
SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
CV and Pattern Recognition
Makes computer-made people move realistically in scenes.