What question did this study set out to answer?

The study aims to explore how input modality and sequence length impact the efficiency and quality of sign language translation.

March 8, 2026Open Access

Modality Matters: Training and Tokenization Effects in Sign-to-Text Translation

Read Full Paperexternally

Key Points

The study aims to explore how input modality and sequence length impact the efficiency and quality of sign language translation.
Compared three input types: raw video, pose keypoints, pretrained features.
Used a shared encoder-decoder architecture.
Applied uniform downsampling across input modalities.
Domain-adapted features yielded the best translation performance.
Raw video surpassed zero-shot features and poses in the absence of domain adaptation.
Downsampling significantly improved training speed and memory efficiency without quality loss.

Abstract

Despite rapid progress in Sign Language Translation (SLT), it remains unclear how input modality and sequence length affect translation quality and efficiency. We conducted a controlled comparison of three commonly used input types—raw video, pose keypoints, and pretrained features—under a shared encoder–decoder architecture and standardized training setup. We show that domain-adapted features perform best overall, while raw video outperforms zero-shot features and poses when domain adaptation is unavailable. By uniformly downsampling input sequences across modalities, we observe substantial gains in training speed and memory efficiency, with no degradation in translation quality. This reveals that SLT systems can safely operate with significantly fewer input tokens—enabling faster experimentation, lower compute requirements, and broader accessibility, and highlighting a promising direction for reducing training time and resource demands. Moreover, we show that all models maintain competitive performance under downsampling conditions, highlighting the viability of fully end-to-end SLT pipelines that do not rely on intermediate representations. We release all code, trained models, and preprocessing scripts at: https: //github. com/GerrySant/multimodalhugs/tree/modalityₘatters-sltat2025

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Gerard Sant

University of Zurich

Amit Moryossef

University of Zurich

Mathias Müller

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modality Matters: Training and Tokenization Effects in Sign-to-Text Translation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study