Key points are not available for this paper at this time.
Character n-gram F-score (CHRF) is shown to correlate very well with human rankings of different machine translation outputs, especially for morphologically rich target languages. However, only two versions have been explored so far, namely CHRF1 (standard F-score, = 1) and CHRF3 ( = 3), both with uniform n-gram weights. In this work, we investigated CHRF in more details, namely parameters in range from 1/6 to 6, and we found out that CHRF2 is the most promising version. Then we investigated different n-gram weights for CHRF2 and found out that the uniform weights are the best option. Apart from this, CHRF scores were systematically compared with WORDF scores, and a preliminary experiment carried out on small amount of data with direct human scores indicates that the main advantage of CHRF is that it does not penalise too hard acceptable variations in high quality translations.
Maja Popović (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: