This study explores fine-tuning methods and dataset structures for multilingual neural machine translation using the No Language Left Behind model, with a case study on Kazakh, English, and Russian. We compare single-stage and two-stage fine-tuning approaches, as well as triplet versus non-triplet dataset configurations, to improve translation quality. A high-quality, 50,000-triplet dataset in information technology domain, manually translated and expert-validated, serves as the in-domain benchmark, complemented by out-of-domain corpora like KazParC. Evaluations using BLEU, chrF, METEOR, and TER metrics reveal that single-stage fine-tuning excels for low-resource pairs (e.g., 0.48 BLEU, 0.77 chrF for Kazakh → Russian), while two-stage fine-tuning benefits high-resource pairs (Russian → English). Triplet datasets improve cross-linguistic consistency compared with non-triplet structures. Our reproducible framework offers practical guidance for adapting neural machine translation to technical domains and low-resource languages.
Kozhirbayev et al. (Wed,) studied this question.