RigMo: Unifying Rig and Motion Learning for Generative Animation
By: Hao Zhang , Jiahao Luo , Bohui Wan and more
Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.
Similar Papers
X-MoGen: Unified Motion Generation across Humans and Animals
CV and Pattern Recognition
Makes computers create human and animal movements from words.
UniMoGen: Universal Motion Generation
CV and Pattern Recognition
Makes any character move realistically without special rules.
UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework
CV and Pattern Recognition
Creates matching 3D moves from videos.