InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
By: Lipeng Wang , Hongxing Fan , Haohua Chen and more
Potential Business Impact:
Creates realistic virtual people that act like real ones.
Generating high-quality human interactions holds significant value for applications like virtual reality and robotics. However, existing methods often fail to preserve unique individual characteristics or fully adhere to textual descriptions. To address these challenges, we introduce InterMoE, a novel framework built on a Dynamic Temporal-Selective Mixture of Experts. The core of InterMoE is a routing mechanism that synergistically uses both high-level text semantics and low-level motion context to dispatch temporal motion features to specialized experts. This allows experts to dynamically determine the selection capacity and focus on critical temporal features, thereby preserving specific individual characteristic identities while ensuring high semantic fidelity. Extensive experiments show that InterMoE achieves state-of-the-art performance in individual-specific high-fidelity 3D human interaction generation, reducing FID scores by 9% on the InterHuman dataset and 22% on InterX.
Similar Papers
Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction
CV and Pattern Recognition
Predicts human movement better, faster, and cheaper.
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Computation and Language
Helps robots understand and plan tasks in 3D.
TrajMoE: Spatially-Aware Mixture of Experts for Unified Human Mobility Modeling
Artificial Intelligence
Helps predict city travel patterns anywhere.