Abstract This article explores the benefits of using automatic quality scores designed for machine translation (MT) to obtain an indicative quality estimation for individual segments of both automatic speech translation (AST) and human simultaneous interpretation (HSI). In a first step, a set of assessment metrics for interpreting (AMI) is set up using MQM as a starting point and completing and adapting it based on quality criteria from interpreting studies and practice. A sample human simultaneous interpretation and automatic speech translation are then assessed segment by segment using AMI and compared to the COMET scores calculated for these segments. A comparative analysis of the results explores potential correlations between the human quality assessment and the COMET scores, the focus being semantic deviations of the target from the source text. The study shows higher amounts of semantic deviations and grammar issues for lower COMET scores in both HSI and AST, suggesting that using automatic quality estimation scores as a pre-screening instrument for human experts to single out critical segments of a speech when assessing AST or HSI might be an avenue worth exploring.
Anja Rütten (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: