Making Pose Representations More Expressive and Disentangled via Residual Vector Quantization
By: Sukhyun Jeong, Hong-Gi Shin, Yong-Hoon Choi
Potential Business Impact:
Makes computer-made people move more realistically.
Recent progress in text-to-motion has advanced both 3D human motion generation and text-based motion control. Controllable motion generation (CoMo), which enables intuitive control, typically relies on pose code representations, but discrete pose codes alone cannot capture fine-grained motion details, limiting expressiveness. To overcome this, we propose a method that augments pose code-based latent representations with continuous motion features using residual vector quantization (RVQ). This design preserves the interpretability and manipulability of pose codes while effectively capturing subtle motion characteristics such as high-frequency details. Experiments on the HumanML3D dataset show that our model reduces Frechet inception distance (FID) from 0.041 to 0.015 and improves Top-1 R-Precision from 0.508 to 0.510. Qualitative analysis of pairwise direction similarity between pose codes further confirms the model's controllability for motion editing.
Similar Papers
Towards Consistent Long-Term Pose Generation
CV and Pattern Recognition
Makes computer animations move smoothly and realistically.
Spatial-Temporal Multi-Scale Quantization for Flexible Motion Generation
CV and Pattern Recognition
Makes computer-made movements look more real.
VAEVQ: Enhancing Discrete Visual Tokenization through Variational Modeling
CV and Pattern Recognition
Makes AI create better, more realistic pictures.