What question did this study set out to answer?

This research aims to explore the use of automatic quality scores to evaluate human simultaneous interpretation and automatic speech translation.

April 29, 2026

Can automatic quality estimation help to pre-assess the quality of human simultaneous interpretation and automatic speech translation? An exploratory case study based on MQM-inspired Assessment Metrics for Interpreting (AMI) and COMET

Key Points

This research aims to explore the use of automatic quality scores to evaluate human simultaneous interpretation and automatic speech translation.
Developed assessment metrics for interpreting based on MQM and existing quality criteria.
Segment-by-segment assessment of human simultaneous interpretation and automatic speech translation using AMI.
Compared AMI scores to COMET scores for correlation analysis.
Higher semantic deviations and grammar issues were observed with lower COMET scores in both interpreting modalities.
Findings suggest that automatic quality scores can identify critical segments for further human assessment.

Abstract

Abstract This article explores the benefits of using automatic quality scores designed for machine translation (MT) to obtain an indicative quality estimation for individual segments of both automatic speech translation (AST) and human simultaneous interpretation (HSI). In a first step, a set of assessment metrics for interpreting (AMI) is set up using MQM as a starting point and completing and adapting it based on quality criteria from interpreting studies and practice. A sample human simultaneous interpretation and automatic speech translation are then assessed segment by segment using AMI and compared to the COMET scores calculated for these segments. A comparative analysis of the results explores potential correlations between the human quality assessment and the COMET scores, the focus being semantic deviations of the target from the source text. The study shows higher amounts of semantic deviations and grammar issues for lower COMET scores in both HSI and AST, suggesting that using automatic quality estimation scores as a pre-screening instrument for human experts to single out critical segments of a speech when assessing AST or HSI might be an avenue worth exploring.

Bookmark

Can automatic quality estimation help to pre-assess the quality of human simultaneous interpretation and automatic speech translation? An exploratory case study based on MQM-inspired Assessment Metrics for Interpreting (AMI) and COMET

Key Points

Abstract

Cite This Study

Also Consider

Also Consider