Score: 0

No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts

Published: October 8, 2025 | arXiv ID: 2510.06988v1

By: Girolamo Macaluso , Lorenzo Mandelli , Mirko Bicchierai and more

Potential Business Impact:

Teaches computers to create new dance moves.

Business Areas:

Motion Capture Media and Entertainment, Video

Diffusion models have recently advanced human motion generation, producing realistic and diverse animations from textual prompts. However, adapting these models to unseen actions or styles typically requires additional motion capture data and full retraining, which is costly and difficult to scale. We propose a post-training framework based on Reinforcement Learning that fine-tunes pretrained motion diffusion models using only textual prompts, without requiring any motion ground truth. Our approach employs a pretrained text-motion retrieval network as a reward signal and optimizes the diffusion policy with Denoising Diffusion Policy Optimization, effectively shifting the model's generative distribution toward the target domain without relying on paired motion data. We evaluate our method on cross-dataset adaptation and leave-one-out motion experiments using the HumanML3D and KIT-ML datasets across both latent- and joint-space diffusion architectures. Results from quantitative metrics and user studies show that our approach consistently improves the quality and diversity of generated motions, while preserving performance on the original distribution. Our approach is a flexible, data-efficient, and privacy-preserving solution for motion adaptation.

Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion

CV and Pattern Recognition

Makes computer animations move like real people.

16 May 2025 2

88%

LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning

Machine Learning (CS)

Teaches robots new jobs from examples.

30 Aug 2025 1

88%

MixerMDM: Learnable Composition of Human Motion Diffusion Models

CV and Pattern Recognition

Mixes AI motion models for better control.

1 Apr 2025 0

View PDF Login to Bookmark

Country of Origin

🇮🇹 Italy

Page Count

10 pages

No MoCap Needed: Post-Training Motion Diffusion Models with Reinforcement Learning using Only Textual Prompts

Teaches computers to create new dance moves.

Technical Abstract

Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion

LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning

MixerMDM: Learnable Composition of Human Motion Diffusion Models