Score: 1

RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting

Published: December 17, 2025 | arXiv ID: 2512.15488v1

By: Seyed Abolfazl Ghasemzadeh, Alexandre Alahi, Christophe De Vleeschouwer

Potential Business Impact:

Makes 3D body tracking work from any cameras.

Business Areas:

Image Recognition Data and Analytics, Software

Estimating 3D human poses from 2D images remains challenging due to occlusions and projective ambiguity. Multi-view learning-based approaches mitigate these issues but often fail to generalize to real-world scenarios, as large-scale multi-view datasets with 3D ground truth are scarce and captured under constrained conditions. To overcome this limitation, recent methods rely on 2D pose estimation combined with 2D-to-3D pose lifting trained on synthetic data. Building on our previous MPL framework, we propose RUMPL, a transformer-based 3D pose lifter that introduces a 3D ray-based representation of 2D keypoints. This formulation makes the model independent of camera calibration and the number of views, enabling universal deployment across arbitrary multi-view configurations without retraining or fine-tuning. A new View Fusion Transformer leverages learned fused-ray tokens to aggregate information along rays, further improving multi-view consistency. Extensive experiments demonstrate that RUMPL reduces MPJPE by up to 53% compared to triangulation and over 60% compared to transformer-based image-representation baselines. Results on new benchmarks, including in-the-wild multi-view and multi-person datasets, confirm its robustness and scalability. The framework's source code is available at https://github.com/aghasemzadeh/OpenRUMPL

PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors

CV and Pattern Recognition

Turns 2D camera video into 3D body moves.

21 Aug 2025 2

87%

Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning

CV and Pattern Recognition

Makes computers create 3D body poses from pictures.

11 Aug 2025 0

87%

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers

Graphics

Makes 3D people from few pictures.

3 Jun 2025 1

View PDF Login to Bookmark

Repos / Data Links

github.com

Page Count

12 pages

RUMPL: Ray-Based Transformers for Universal Multi-View 2D to 3D Human Pose Lifting

Makes 3D body tracking work from any cameras.

Technical Abstract

PriorFormer: A Transformer for Real-time Monocular 3D Human Pose Estimation with Versatile Geometric Priors

Pose-RFT: Enhancing MLLMs for 3D Pose Generation via Hybrid Action Reinforcement Fine-Tuning

HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers