Score: 1

A multitask transformer to sign language translation using motion gesture primitives

Published: March 25, 2025 | arXiv ID: 2503.19668v1

By: Fredy Alejandro Mendoza López, Jefferson Rodriguez, Fabio Martínez

Potential Business Impact:

Translates sign language into written words.

Business Areas:

Translation Service Professional Services

The absence of effective communication the deaf population represents the main social gap in this community. Furthermore, the sign language, main deaf communication tool, is unlettered, i.e., there is no formal written representation. In consequence, main challenge today is the automatic translation among spatiotemporal sign representation and natural text language. Recent approaches are based on encoder-decoder architectures, where the most relevant strategies integrate attention modules to enhance non-linear correspondences, besides, many of these approximations require complex training and architectural schemes to achieve reasonable predictions, because of the absence of intermediate text projections. However, they are still limited by the redundant background information of the video sequences. This work introduces a multitask transformer architecture that includes a gloss learning representation to achieve a more suitable translation. The proposed approach also includes a dense motion representation that enhances gestures and includes kinematic information, a key component in sign language. From this representation it is possible to avoid background information and exploit the geometry of the signs, in addition, it includes spatiotemporal representations that facilitate the alignment between gestures and glosses as an intermediate textual representation. The proposed approach outperforms the state-of-the-art evaluated on the CoL-SLTD dataset, achieving a BLEU-4 of 72,64% in split 1, and a BLEU-4 of 14,64% in split 2. Additionally, the strategy was validated on the RWTH-PHOENIX-Weather 2014 T dataset, achieving a competitive BLEU-4 of 11,58%.

A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations

Machine Learning (CS)

Translates spoken words into sign language videos.

4 Mar 2025 0

89%

Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition

CV and Pattern Recognition

Helps computers understand sign language faster.

10 Apr 2025 0

89%

Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks

CV and Pattern Recognition

Helps computers understand sign language for deaf people.

4 Jun 2025 0

View PDF Login to Bookmark

Country of Origin

🇨🇴 Colombia

Page Count

32 pages

A multitask transformer to sign language translation using motion gesture primitives

Translates sign language into written words.

Technical Abstract

A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations

Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition

Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks