January 1, 2014Open Access

Testing for Significance of Increased Correlation with Human Judgment

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Automatic metrics are widely used in ma-chine translation as a substitute for hu-man assessment. With the introduction of any new metric comes the question of just how well that metric mimics human assessment of translation quality. This is often measured by correlation with hu-man judgment. Significance tests are gen-erally not used to establish whether im-provements over existing methods such as BLEU are statistically significant or have occurred simply by chance, however. In this paper, we introduce a significance test for comparing correlations of two metrics, along with an open-source implementation of the test. When applied to a range of metrics across seven language pairs, tests show that for a high proportion of metrics, there is insufficient evidence to conclude significant improvement over BLEU. 1

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo