Score: 0

SignX: The Foundation Model for Sign Recognition

Published: April 22, 2025 | arXiv ID: 2504.16315v1

By: Sen Fang , Chunyu Sui , Hongwei Yi and more

Potential Business Impact:

Translates sign language videos into text accurately.

Business Areas:

Image Recognition Data and Analytics, Software

The complexity of sign language data processing brings many challenges. The current approach to recognition of ASL signs aims to translate RGB sign language videos through pose information into English-based ID glosses, which serve to uniquely identify ASL signs. Note that there is no shared convention for assigning such glosses to ASL signs, so it is essential that the same glossing conventions are used for all of the data in the datasets that are employed. This paper proposes SignX, a foundation model framework for sign recognition. It is a concise yet powerful framework applicable to multiple human activity recognition scenarios. First, we developed a Pose2Gloss component based on an inverse diffusion model, which contains a multi-track pose fusion layer that unifies five of the most powerful pose information sources--SMPLer-X, DWPose, Mediapipe, PrimeDepth, and Sapiens Segmentation--into a single latent pose representation. Second, we trained a Video2Pose module based on ViT that can directly convert raw video into signer pose representation. Through this 2-stage training framework, we enable sign language recognition models to be compatible with existing pose formats, laying the foundation for the common pose estimation necessary for sign recognition. Experimental results show that SignX can recognize signs from sign language video, producing predicted gloss representations with greater accuracy than has been reported in prior work.

A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations

Machine Learning (CS)

Translates spoken words into sign language videos.

4 Mar 2025 0

87%

Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition

CV and Pattern Recognition

Helps computers understand sign language better.

26 Mar 2025 2

87%

Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms

CV and Pattern Recognition

Helps computers understand sign language better.

9 Mar 2025 1

View PDF Login to Bookmark

Page Count

12 pages

SignX: The Foundation Model for Sign Recognition

Translates sign language videos into text accurately.

Technical Abstract

A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations

Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition

Sign Language Translation using Frame and Event Stream: Benchmark Dataset and Algorithms