Neural Machine Translation (NMT) systems have achieved strong performance for highresource languages; however, many African languages remain underrepresented due to the scarcity of high-quality parallel data. Kimbundu, a Bantu language spoken in Angola, is one such low-resource language with limited machine translation support.In this work, we introduce a manually curated and humanreviewed Kimbundu–Portuguese parallel corpus and investigate its use for fine-tuning multilingual NMT models. By leveraging the NLLB200 (600M) architecture, we employ parameterefficient fine-tuning with QLoRA to adapt the model to the Kimbundu→Portuguese direction. Experimental results on a professionally reviewed test set of 1,000 sentence pairs demonstrate substantial improvements over strong multilingual baselines, with gains of +10.1 BLEU and +13.2 chrF. Furthermore, semantic metrics—including COMET, AfriCOMET, and BERTScore—show consistent growth, while qualitative analysis confirms better handling of Kimbundu’s complex morphology. These findings suggest that high-quality human reviewed data, combined with efficient fine-tuning, is a viable path to bridging the digital divide for low-resource African languages.
Ramalheira et al. (Fri,) studied this question.