Optimizing Chinese word segmentation for machine translation performance

Key Points

Key points are not available for this paper at this time.

Abstract

Previous work has shown that Chinese word segmentation is useful for machine translation to English, yet the way different segmentation strategies affect MT is still poorly understood. In this paper, we demonstrate that optimizing segmentation for an existing segmentation standard does not always yield better MT performance. We find that other factors such as segmentation consistency and granularity of Chinese "words" can be more important for machine translation. Based on these findings, we implement methods inside a conditional random field segmenter that directly optimize segmentation granularity with respect to the MT task, providing an improvement of 0.73 BLEU. We also show that improving segmentation consistency using external lexicon and proper noun features yields a 0.32 BLEU increase.

Bookmark

View Full Paper

Bookmark

View Full Paper

Optimizing Chinese word segmentation for machine translation performance

Key Points

Abstract

Cite This Study

Also Consider

Also Consider