January 1, 2021Open Access

Low Resource Multimodal Neural Machine Translation of EnglishHindi in News Domain

Key Points

Key points are not available for this paper at this time.

Abstract

Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Ma chine Translation (MMT) tasks, where the cap tions are short image descriptions, the news captions provide a more detailed description of the contents of the images. As a result, nu merous named entities relating to specific per sons, locations, etc., are found. In this paper, we acquire two monolingual news datasets re ported in English and Hindi paired with the images to generate a synthetic EnglishHindi parallel corpus. The parallel corpus is used to train the EnglishHindi Neural Machine Trans lation (NMT) and an EnglishHindi MMT sys tem by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the EnglishHindi MT systems with 1) more synthetic data and 2) by adding backtranslated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.

AI에게 질문

Bookmark

View Full Paper