What question did this study set out to answer?

The aim is to evaluate the effectiveness of using auxiliary parallel data for enhancing NMT in low-resource languages.

March 12, 2026Open Access

Exploiting Domain-Specific Parallel Data on Multilingual Language Models for Low-Resource Language Translation

Key Points

The aim is to evaluate the effectiveness of using auxiliary parallel data for enhancing NMT in low-resource languages.
Evaluated techniques for fine-tuning and pre-training multilingual sequence-to-sequence language models with domain-specific data.
Analyzed the impact of domain divergence on the performance of NMT models.
Recommended strategies for effectively utilizing auxiliary parallel data.
Using auxiliary parallel data significantly improved NMT performance for low-resource languages.
The techniques demonstrated potential for better translation outcomes in domain-specific contexts.
Domain divergence was found to negatively affect model performance.

Abstract

Neural Machine Translation (NMT) systems built on multilingual sequence-to-sequence Language Models (msLMs) fail to deliver expected results when the amount of parallel data for a language, as well as the language’s representation in the model are limited. This restricts the capabilities of domain-specific NMT systems for low-resource languages (LRLs). As a solution, parallel data from auxiliary domains can be used either to fine-tune or to further pre-train the msLM. We present an evaluation of the effectiveness of these two techniques in the context of domain-specific LRL-NMT. We also explore the impact of domain divergence on NMT model performance. We recommend several strategies for utilizing auxiliary parallel data in building domain-specific NMT models for LRLs.

Read Full Paperexternally

AIに質問

Bookmark

View Full Paper

Cite This Study

Ranathunga et al. (Mon,) studied this question.

synapsesocial.com/papers/69b25aca96eeacc4fcec8d9a https://doi.org/https://doi.org/10.1145/3800681

AIに質問

Bookmark

View Full Paper