MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism
By: Yanyi Qu, Haoyang Ma, Wenhui Xiong
Potential Business Impact:
Tracks people's movements using Wi-Fi signals.
Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher accuracy over state-of-the-art approaches, especially for high-mobility keypoints (wrists, elbows) that are particularly difficult for previous methods to accurately estimate.
Similar Papers
KASportsFormer: Kinematic Anatomy Enhanced Transformer for 3D Human Pose Estimation on Short Sports Scene Video
CV and Pattern Recognition
Helps computers understand fast sports moves.
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation
CV and Pattern Recognition
Helps computers judge how good someone is at tasks.
MPFormer: Adaptive Framework for Industrial Multi-Task Personalized Sequential Retriever
Information Retrieval
Improves video suggestions for more user fun.