What does this research mean for the field?

An extensive meta-evaluation of machine translation evaluation methodologies reveals unexpected characteristics regarding annotator agreement and the correlation of automatic metrics with human judgments. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

January 1, 2007Open Access

(Meta-) evaluation of machine translation

Key Points

Key points are not available for this paper at this time.

Abstract

This paper evaluates the translation quality of machine translation systems for 8 language pairs: translating French, German, Spanish, and Czech to English and back. We carried out an extensive human evaluation which allowed us not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process. We measured timing and intra- and inter-annotator agreement for three types of subjective evaluation. We measured the correlation of automatic evaluation metrics with human judgments. This meta-evaluation reveals surprising facts about the most commonly used methodologies.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper