Score: 0

MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

Published: May 28, 2025 | arXiv ID: 2505.22555v2

By: Yanyi Qu, Haoyang Ma, Wenhui Xiong

Potential Business Impact:

Tracks people's movements using Wi-Fi signals.

Business Areas:
Facial Recognition Data and Analytics, Software

Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher accuracy over state-of-the-art approaches, especially for high-mobility keypoints (wrists, elbows) that are particularly difficult for previous methods to accurately estimate.

Country of Origin
🇨🇳 China

Page Count
13 pages

Category
Computer Science:
CV and Pattern Recognition