Score: 1

DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models

Published: December 26, 2025 | arXiv ID: 2512.22324v1

By: Jianrong Zhang, Hehe Fan, Yi Yang

Potential Business Impact:

Breaks down complex movements into simple parts.

Business Areas:

Motion Capture Media and Entertainment, Video

Human motions are compositional: complex behaviors can be described as combinations of simpler primitives. However, existing approaches primarily focus on forward modeling, e.g., learning holistic mappings from text to motion or composing a complex motion from a set of motion concepts. In this paper, we consider the inverse perspective: decomposing a holistic motion into semantically meaningful sub-components. We propose DeMoGen, a compositional training paradigm for decompositional learning that employs an energy-based diffusion model. This energy formulation directly captures the composed distribution of multiple motion concepts, enabling the model to discover them without relying on ground-truth motions for individual concepts. Within this paradigm, we introduce three training variants to encourage a decompositional understanding of motion: 1. DeMoGen-Exp explicitly trains on decomposed text prompts; 2. DeMoGen-OSS performs orthogonal self-supervised decomposition; 3. DeMoGen-SC enforces semantic consistency between original and decomposed text embeddings. These variants enable our approach to disentangle reusable motion primitives from complex motion sequences. We also demonstrate that the decomposed motion concepts can be flexibly recombined to generate diverse and novel motions, generalizing beyond the training distribution. Additionally, we construct a text-decomposed dataset to support compositional training, serving as an extended resource to facilitate text-to-motion generation and motion composition.

RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space

CV and Pattern Recognition

Lets you make videos of anyone doing anything.

12 Aug 2025 1

89%

GENMO: A GENeralist Model for Human MOtion

Graphics

Makes one computer program create and fix body movements.

2 May 2025 0

89%

EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer

CV and Pattern Recognition

Makes videos of people move more realistically.

21 Dec 2025 2

View PDF Login to Bookmark

Page Count

18 pages

DeMoGen: Towards Decompositional Human Motion Generation with Energy-Based Diffusion Models

Breaks down complex movements into simple parts.

Technical Abstract

RealisMotion: Decomposed Human Motion Control and Video Generation in the World Space

GENMO: A GENeralist Model for Human MOtion

EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer