Score: 0

RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation

Published: December 8, 2025 | arXiv ID: 2512.07273v1

By: Zhi Rao , Yucheng Zhou , Benjia Zhou and more

Potential Business Impact:

Translates sign language into words better.

Business Areas:

Translation Service Professional Services

Gloss-free sign language translation (SLT) is hindered by two key challenges: **inadequate sign representation** that fails to capture nuanced visual cues, and **sentence-level semantic misalignment** in current LLM-based methods, which limits translation quality. To address these issues, we propose a three-stage **r**einforcing **v**ision-**l**anguage **f**ramework (**RVLF**). We build a large vision-language model (LVLM) specifically designed for sign language, and then combine it with reinforcement learning (RL) to adaptively enhance translation performance. First, for a sufficient representation of sign language, RVLF introduces an effective semantic representation learning mechanism that fuses skeleton-based motion cues with semantically rich visual features extracted via DINOv2, followed by instruction tuning to obtain a strong SLT-SFT baseline. Then, to improve sentence-level semantic misalignment, we introduce a GRPO-based optimization strategy that fine-tunes the SLT-SFT model with a reward function combining translation fidelity (BLEU) and sentence completeness (ROUGE), yielding the optimized model termed SLT-GRPO. Our conceptually simple framework yields substantial gains under the gloss-free SLT setting without pre-training on any external large-scale sign language datasets, improving BLEU-4 scores by +5.1, +1.11, +1.4, and +1.61 on the CSL-Daily, PHOENIX-2014T, How2Sign, and OpenASL datasets, respectively. To the best of our knowledge, this is the first work to incorporate GRPO into SLT. Extensive experiments and ablation studies validate the effectiveness of GRPO-based optimization in enhancing both translation quality and semantic consistency.

VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training

Computation and Language

Teaches AI to see and talk better.

16 Jun 2025 2

90%

UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning

CV and Pattern Recognition

Helps drones understand pictures faster and better.

15 Aug 2025 1

90%

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models

Computation and Language

Teaches computers to think better, not just copy.

10 Apr 2025 2

View PDF Login to Bookmark

Country of Origin

🇲🇴 Macao

Page Count

16 pages

RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation

Translates sign language into words better.

Technical Abstract

VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training

UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning

SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models