Multilingual neural machine translation (MNMT) aims to support arbitrary translations across multiple languages.MNMT has recently seen dramatic improvements with large language models (LLMs), yet LLMs often require substantial computational resources and may be insufficient for MNMT in terms of quality. In this work, we present MITRE (multilingual translation with registers), a series of pre-trained MNMT-specific models trained on 9.3 billion sentence pairs across 24 languages collected from public corpora to compete with commercial LLMs. Built on the decoder-only architecture, MITRE integrates a novel mechanism called registering, which inserts a sequence of artificial tokens, namely registers, between source and target tokens and modifies the attention mask such that generation pays attention exclusively to the activated registers. Through experiments on EC-40, a large-scale training set that enables fair methodological comparison, we first demonstrate that registering advances the state-of-the-art in MNMT-specific methods. Second, we show that one of our models, MITRE-913M, outperforms NLLB-3.3B in most cases, and achieves performance close to GPT-4o mini with less than 1 billion parameters measured by spBLEU, chrF++, and COMET. Third, fine-tuning experiments in various scenarios show that our models have strong fine-tuning adaptability compared to NLLB series. Finally, based on analysis, we show that registers reflect the semantics of corresponding source tokens in the target language space.
Qu et al. (Thu,) studied this question.