The emergence of large language models (LLMs) has reshaped machine translation (MT). Although neural machine translation (NMT) systems like Google Translate (GT) remain dominant, systematic comparisons between LLMs and NMT systems across key quality dimensions are still limited, especially in specialised domains such as technical translation. This study aims to compare the translation quality and error subtypes of GT and ChatGPT-4 in Chinese-English technical manual translation. Eighty paragraph-level segments from Chinese product manuals were translated by both systems. Two trained annotators evaluated the outputs using a Likert scale across four MQM-based dimensions: accuracy, fluency, terminology, and style. Inter-rater agreement was tested and qualitative data analysis was conducted using NVivo. Results indicated that ChatGPT-4 outperformed GT across all dimensions, delivering higher quality translation, whereas GT frequently exhibited errors such as redundancy, stilted phrasing, non-standard terminology, and formality mismatches. ChatGPT-4, however, occasionally produced over-translation and semantic overgeneralisation, compromising terminological precision. Despite the superior performance of ChatGPT-4, it still poses certain potential risks. Its context-driven outputs may introduce inferential or stylistic deviations, especially in specialised terminology. For high-stakes technical content, expert revision is recommended to ensure semantic fidelity and terminological consistency.
Zhang et al. (Thu,) studied this question.