Score: 0

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Published: December 30, 2025 | arXiv ID: 2512.24100v1

By: Yijie Qian , Juncheng Wang , Yuxiang Feng and more

Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effective for simple actions, this System 1 approach faces a fundamental theoretical bottleneck we identify as the Semantic-Kinematic Impedance Mismatch: the inherent difficulty of grounding semantically dense, discrete linguistic intent into kinematically dense, high-frequency motion data in a single shot. In this paper, we argue that the solution lies in an architectural shift towards Latent System 2 Reasoning. Drawing inspiration from Hierarchical Motor Control in cognitive science, we propose Latent Motion Reasoning (LMR) that reformulates generation as a two-stage Think-then-Act decision process. Central to LMR is a novel Dual-Granularity Tokenizer that disentangles motion into two distinct manifolds: a compressed, semantically rich Reasoning Latent for planning global topology, and a high-frequency Execution Latent for preserving physical fidelity. By forcing the model to autoregressively reason (plan the coarse trajectory) before it moves (instantiates the frames), we effectively bridge the ineffability gap between language and physics. We demonstrate LMR's versatility by implementing it for two representative baselines: T2M-GPT (discrete) and MotionStreamer (continuous). Extensive experiments show that LMR yields non-trivial improvements in both semantic alignment and physical plausibility, validating that the optimal substrate for motion planning is not natural language, but a learned, motion-aligned concept space. Codes and demos can be found in \hyperlink{https://chenhaoqcdyq.github.io/LMR/}{https://chenhaoqcdyq.github.io/LMR/}

MoLingo: Motion-Language Alignment for Text-to-Motion Generation

CV and Pattern Recognition

Makes characters move like real people from words.

15 Dec 2025 1

91%

Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

CV and Pattern Recognition

Makes characters move realistically from text descriptions.

12 Jun 2025 0

90%

MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis

CV and Pattern Recognition

Makes videos follow real-world physics rules.

3 Dec 2025 4

View PDF Login to Bookmark

Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation

Technical Abstract

MoLingo: Motion-Language Alignment for Text-to-Motion Generation

Motion-R1: Chain-of-Thought Reasoning and Reinforcement Learning for Human Motion Generation

MoReGen: Multi-Agent Motion-Reasoning Engine for Code-based Text-to-Video Synthesis