January 1, 2004Open Access

Statistical machine translation with word- and sentence-aligned parallel corpora

Key Points

Key points are not available for this paper at this time.

Abstract

The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.

Statistical machine translation with word- and sentence-aligned parallel corpora

Key Points

Abstract

Cite This Study

Also Consider

Also Consider