Key points are not available for this paper at this time.
Neural machine translation based on bilingual text with limited training data suffers from lexical diversity, which lowers the rare word translation accuracy and reduces the general izability of the translation system. In this work, we utilise the multiple captions from the Multi-30K dataset to increase the lexical di versity aided with the crosslingual transfer of information among the languages in a multi lingual setup. In this multilingual and multi modal setting, the inclusion of the visual fea tures boosts the translation quality by a signif icant margin. Empirical study affirms that our proposed multimodal approach achieves sub stantial gain in terms of the automatic score and shows robustness in handling the rare word translation in the pretext of English to/from Hindi and Telugu translation tasks.
Singh et al. (Fri,) studied this question.