Score: 2

Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows

Published: May 3, 2025 | arXiv ID: 2505.01802v1

By: Georgios Fotios Angelis , Savas Ozkan , Sinan Mutlu and more

BigTech Affiliations: Samsung

Potential Business Impact:

Makes virtual bodies move more realistically and faster.

Business Areas:

Motion Capture Media and Entertainment, Video

To have a seamless user experience on immersive AR/VR applications, the importance of efficient and effective Neural Network (NN) models is undeniable, since missing body parts that cannot be captured by limited sensors should be generated using these models for a complete 3D full-body reconstruction in virtual environment. However, the state-of-the-art NN-models are typically computational expensive and they leverage longer sequences of sparse tracking inputs to generate full-body movements by capturing temporal context. Inevitably, longer sequences increase the computation overhead and introduce noise in longer temporal dependencies that adversely affect the generation performance. In this paper, we propose a novel Multi-Layer Perceptron (MLP)-based method that enhances the overall performance while balancing the computational cost and memory overhead for efficient 3D full-body generation. Precisely, we introduce a NN-mechanism that divides the longer sequence of inputs into smaller temporal windows. Later, the current motion is merged with the information from these windows through latent representations to utilize the past context for the generation. Our experiments demonstrate that generation accuracy of our method with this NN-mechanism is significantly improved compared to the state-of-the-art methods while greatly reducing computational costs and memory overhead, making our method suitable for resource-constrained devices.

Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs

CV and Pattern Recognition

Makes virtual bodies move like real ones.

20 Nov 2025 2

88%

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

CV and Pattern Recognition

Predicts how people move together, faster.

13 Jul 2025 2

87%

An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions

CV and Pattern Recognition

Makes computers understand videos much better, faster.

2 Mar 2025 1

View PDF Login to Bookmark

Country of Origin

🇰🇷 South Korea

Page Count

9 pages

Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows

Makes virtual bodies move more realistically and faster.

Technical Abstract

Mem-MLP: Real-Time 3D Human Motion Generation from Sparse Inputs

Efficient Multi-Person Motion Prediction by Lightweight Spatial and Temporal Interactions

An Efficient 3D Convolutional Neural Network with Channel-wise, Spatial-grouped, and Temporal Convolutions