SceneAdapt: Scene-aware Adaptation of Human Motion Diffusion
By: Jungbin Cho , Minsu Kim , Jisoo Kim and more
Potential Business Impact:
Makes computer-made people move realistically in scenes.
Human motion is inherently diverse and semantically rich, while also shaped by the surrounding scene. However, existing motion generation approaches address either motion semantics or scene-awareness in isolation, since constructing large-scale datasets with both rich text--motion coverage and precise scene interactions is extremely challenging. In this work, we introduce SceneAdapt, a framework that injects scene awareness into text-conditioned motion models by leveraging disjoint scene--motion and text--motion datasets through two adaptation stages: inbetweening and scene-aware inbetweening. The key idea is to use motion inbetweening, learnable without text, as a proxy task to bridge two distinct datasets and thereby inject scene-awareness to text-to-motion models. In the first stage, we introduce keyframing layers that modulate motion latents for inbetweening while preserving the latent manifold. In the second stage, we add a scene-conditioning layer that injects scene geometry by adaptively querying local context through cross-attention. Experimental results show that SceneAdapt effectively injects scene awareness into text-to-motion models, and we further analyze the mechanisms through which this awareness emerges. Code and models will be released.
Similar Papers
Object-Aware 4D Human Motion Generation
CV and Pattern Recognition
Makes videos of people move realistically with objects.
Jointly Understand Your Command and Intention:Reciprocal Co-Evolution between Scene-Aware 3D Human Motion Synthesis and Analysis
CV and Pattern Recognition
Makes robots move realistically in 3D scenes.
AnimateScene: Camera-controllable Animation in Any Scene
CV and Pattern Recognition
Makes animated people fit perfectly into real scenes.