Abstract While machine translation (MT) offers potential benefits for language learning, concerns about academic integrity persist when MT outputs are incorporated by students into assessed work without prior permission or, where required, explicit disclosure. However, teachers’ ability to detect such unauthorised MT use varies widely. This study explores whether providing raters with a consolidated analysis report from ProWritingAid (an AI-powered writing analytics tool) as a decision aid improves their detection accuracy across MT use conditions (non-MT-aided, Google Translate-aided, ChatGPT-aided). It also examines the linguistic indicators that support raters’ detection judgements. A two-stage experiment was conducted at a Chinese university. Stage 1 involved 39 intermediate- to upper-intermediate-proficiency students of English as a foreign language (EFL), who were randomly assigned equally across the three MT use conditions and completed two translation tasks accordingly. In Stage 2, 78 EFL instructors served as raters and were randomly assigned in equal numbers to the unassisted rater group (S2UR) and AI-assisted rater group (S2AR). Before the detection tasks, the S2AR Group was provided with a ProWritingAid report and a brief orientation on how to interpret it. For each task, raters classified the translation sample from Stage 1 as MT-aided or non-MT-aided and recorded the linguistic indicators underpinning their judgement. The results show that AI assistance increased mean detection accuracy from ~52% to ~75%, and detection accuracy differed by MT use condition (highest for ChatGPT-aided samples, lowest for Google Translate-aided samples). Raters prioritised MT-strength-based cues (i.e., linguistic features indicative of proficiency beyond the cohort’s expected level) over MT-error-based cues in their judgements. This exploratory study provides preliminary evidence that AI assistance can support rater-led detection of unauthorised MT use in translation assessment. Implications include combining AI flagging with human review, calibrating detection criteria for each MT system, and articulating clear policies on acceptable MT use in language education.
Zhou et al. (Tue,) studied this question.