Bi-modal Prediction and Transformation Coding for Compressing Complex Human Dynamics
By: Huong Hoang , Keito Suzuki , Truong Nguyen and more
Potential Business Impact:
Makes animated characters move more realistically.
For dynamic human motion sequences, the original KeyNode-Driven codec often struggles to retain compression efficiency when confronted with rapid movements or strong non-rigid deformations. This paper proposes a novel Bi-modal coding framework that enhances the flexibility of motion representation by integrating semantic segmentation and region-specific transformation modeling. The rigid transformation model (rotation & translation) is extended with a hybrid scheme that selectively applies affine transformations-rotation, translation, scaling, and shearing-only to deformation-rich regions (e.g., the torso, where loose clothing induces high variability), while retaining rigid models elsewhere. The affine model is decomposed into minimal parameter sets for efficient coding and combined through a component selection strategy guided by a Lagrangian Rate-Distortion optimization. The results show that the Bi-modal method achieves more accurate mesh deformation, especially in sequences involving complex non-rigid motion, without compromising compression efficiency in simpler regions, with an average bit-rate saving of 33.81% compared to the baseline.
Similar Papers
KeyNode-Driven Geometry Coding for Real-World Scanned Human Dynamic Mesh Compression
CV and Pattern Recognition
Makes 3D people in games look real with less data.
Rethinking Generative Human Video Coding with Implicit Motion Transformation
CV and Pattern Recognition
Makes videos of people move more smoothly.
Fine-Grained Motion Compression and Selective Temporal Fusion for Neural B-Frame Video Coding
Image and Video Processing
Makes videos load faster and look better.