With the rapid growth of global multilingual communication, the demand for high-quality machine translation into English from diverse source languages has become increasingly critical. This study addresses the persistent challenges of severe parallel data scarcity, domain mismatch, and poor generalization in medium- and low-resource language-to-English translation. We utilize state-of-the-art cross lingual pre-trained models (mBART-50, mT5-large, and NLLB-200 distilled variants) to establish a robust transfer learning framework. Through supervised fine-tuning, zero shot, and few-shot adaptation strategies, the proposed approach successfully optimizes translation quality in data-scarce settings. Experimental results on WMT high-resource benchmarks and the Flores-200 low-resource test set demonstrate that the NLLB-200-distilled-600M model achieves an average SacreBLEU improvement of 13.6 points over traditional from-scratch Transformer baselines and 4.6 points over mBART-50 in medium/low-resource. In zero shot settings, NLLB-200,maintains usable performance (average 24.8 BLEU), significantly outperforming other models. These findings provide strong scientific guidance for low-resource English translation research and point to a promising new technical pathway for building inclusive, scalable, and efficient multilingual translation systems in the era of global connectivity. • A knowledge-driven framework leveraging state-of-the-art cross-lingual pre-trained models is proposed for English-centric machine translation. • The NLLB-200-distilled-600M model achieves a 13.6 SacreBLEU point improvement over traditional Transformer baselines in low-resource settings. • Superior zero-shot translation capability is demonstrated, with NLLB-200 maintaining an average of 24.8 BLEU without parallel data. • Ablation studies confirm that increasing pre-training language coverage and joint translation objectives are critical for cross-lingual transfer. • The study provides a scalable technical pathway for high-quality English translation from under-resourced and typologically diverse languages.
Yingting Zhang (Wed,) studied this question.