What question did this study set out to answer?

The research aims to advance multilingual neural machine translation quality using a unique framework called MITRE.

June 17, 2026Open Access

MITRE: Efficient Pre-trained Models for Multilingual Neural Machine Translation with Registering

Key Points

The research aims to advance multilingual neural machine translation quality using a unique framework called MITRE.
Developed MITRE, a series of pre-trained models for multilingual translation with 9.3 billion sentence pairs.
Integrated a registering mechanism involving artificial tokens to enhance attention focusing.
Conducted experiments on a large-scale dataset, EC-40, for methodological comparisons.
MITRE-913M outperforms NLLB-3.3B in most scenarios with strong performance metrics.
Achieved results close to GPT-4o mini using under 1 billion parameters, based on spBLEU, chrF++, and COMET scores.
Demonstrated robust adaptability in fine-tuning across various translation scenarios compared to the NLLB series.

Abstract

Multilingual neural machine translation (MNMT) aims to support arbitrary translations across multiple languages.MNMT has recently seen dramatic improvements with large language models (LLMs), yet LLMs often require substantial computational resources and may be insufficient for MNMT in terms of quality. In this work, we present MITRE (multilingual translation with registers), a series of pre-trained MNMT-specific models trained on 9.3 billion sentence pairs across 24 languages collected from public corpora to compete with commercial LLMs. Built on the decoder-only architecture, MITRE integrates a novel mechanism called registering, which inserts a sequence of artificial tokens, namely registers, between source and target tokens and modifies the attention mask such that generation pays attention exclusively to the activated registers. Through experiments on EC-40, a large-scale training set that enables fair methodological comparison, we first demonstrate that registering advances the state-of-the-art in MNMT-specific methods. Second, we show that one of our models, MITRE-913M, outperforms NLLB-3.3B in most cases, and achieves performance close to GPT-4o mini with less than 1 billion parameters measured by spBLEU, chrF++, and COMET. Third, fine-tuning experiments in various scenarios show that our models have strong fine-tuning adaptability compared to NLLB series. Finally, based on analysis, we show that registers reflect the semantics of corresponding source tokens in the target language space.

AIに質問

Bookmark

View Full Paper

Cite This Study

Qu et al. (Thu,) studied this question.

synapsesocial.com/papers/6a323957d50b63ecad204e5d https://doi.org/https://doi.org/10.5715/jnlp.33.809

AIに質問

Bookmark

View Full Paper