Sign language translation systems traditionally rely on intermediate gloss representations to bridge the gap between visual input and written language output. However, manual gloss annotation is costly, language-dependent, and often lossy, prompting growing interest in gloss-free alternatives. This paper introduces H andscribe , a novel two-stage framework for gloss-free sign language translation and gloss sequence generation. H andscribe first translates continuous sign language videos into written language sentences using a lightweight decoder built atop SlowFast-based spatiotemporal features and a frozen mBART model. Then, in the second stage, it generates gloss sequences from these sentences using a Large Language Model (LLaMa3.1-8B-Instruct) that has been fine-tuned with weak supervision. Our experiments on PHOENIX-2014-T and Wav2Gloss Fieldwork demonstrate strong translation performance and state-of-the-art multilingual gloss generation, even in zero-shot settings. The proposed framework reduces annotation bottlenecks while maintaining flexibility and interpretability, paving the way for scalable and inclusive sign language technologies. The code and fine-tuning scripts are available at https://github.com/colonnaemanuele/Handscribe . • We propose a gloss-free framework for sign language translation and gloss sequence generation. • Our method leverages SlowFast features and a frozen mBART decoder. • Glosses are inferred post-translation using a fine-tuned Large Language Model. • The approach eliminates the need for gloss-level supervision during training. • We report strong results on PHOENIX-2014-T and Wav2Gloss benchmarks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Emanuele Colonna
Ivan Rinaldi
David Landi
Computer Vision and Image Understanding
University of Bari Aldo Moro
University of Siena
Building similarity graph...
Analyzing shared references across papers
Loading...
Colonna et al. (Sun,) studied this question.
synapsesocial.com/papers/69994ba9873532290d01fc58 — DOI: https://doi.org/10.1016/j.cviu.2026.104674