Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment
By: Youngjoon Jang , Liliane Momeni , Zifan Jiang and more
Our aim is to develop a unified model for sign language understanding, that performs sign language translation (SLT) and sign-subtitle alignment (SSA). Together, these two tasks enable the conversion of continuous signing videos into spoken language text and also the temporal alignment of signing with subtitles -- both essential for practical communication, large-scale corpus construction, and educational applications. To achieve this, our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images while preserving signer privacy; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings to bridge the vision-text gap; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA, reinforcing both linguistic and temporal alignment. To promote cross-linguistic generalisation, we pretrain our model on large-scale sign-text corpora covering British Sign Language (BSL) and American Sign Language (ASL) from the BOBSL and YouTube-SL-25 datasets. With this multilingual pretraining and strong model design, we achieve state-of-the-art results on the challenging BOBSL (BSL) dataset for both SLT and SSA. Our model also demonstrates robust zero-shot generalisation and finetuned SLT performance on How2Sign (ASL), highlighting the potential of scalable translation across different sign languages.
Similar Papers
Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing
Computation and Language
Helps translate sign language videos into text.
Deep Understanding of Sign Language for Sign to Subtitle Alignment
CV and Pattern Recognition
Makes sign language videos match spoken words better.
Sign Language Translation with Sentence Embedding Supervision
Computation and Language
Teaches computers to translate sign language without labels.