What question did this study set out to answer?

The study aims to compare evaluations of classical Chinese poetry translations by bilingual and monolingual audiences, focusing on human and GPT-4 outputs.

May 16, 2026Open Access

Two audiences, two standards: evaluating human vs. LLM poetry translation through bilingual and monolingual lenses

Key Points

The study aims to compare evaluations of classical Chinese poetry translations by bilingual and monolingual audiences, focusing on human and GPT-4 outputs.
Mixed methods approach using ratings and qualitative comments from 38 readers (19 bilingual, 19 monolingual).
Assessment of five translations (four human, one GPT-4) across six classical Chinese poems.
Qualitative analysis of reader feedback to explore distinct evaluation criteria.
Bilingual readers preferred human translations for their fidelity to semantic and cultural elements.
Monolingual readers favored GPT-4 translations for their fluency and emotional resonance.
AI-generated translations received positive evaluations when they excelled in fluency and clarity, sometimes surpassing human translations.

Abstract

Abstract This study investigates how bilingual (Chinese-English) and monolingual (English) audiences evaluate classical Chinese poetry translations, comparing human and GPT-4 outputs. Using mixed methods (ratings, qualitative comments), 19 bilingual and 19 monolingual readers assessed five translations (four human, one GPT-4) for each of six classical Chinese poems. Findings revealed two distinct, audience-specific standards. Bilingual readers, applying a fidelity-forward standard, valued comprehensive faithfulness (a synthesis of semantic fidelity, poetic structure, and cultural nuance) and strongly preferred a human translation. Conversely, monolingual readers, applying a fluency-forward standard, prioritized linguistic fluency, clarity, and emotional resonance, and significantly preferred the GPT-4 translations. Qualitative analysis further illuminated these divergent criteria. These findings indicate that evaluation standards are significantly shaped by source language access. Notably, AI-generated translations can achieve a positive reception from target-language readers by excelling in prioritized dimensions like fluency and clarity, in some cases rated more favourably than human translations by monolingual readers. This research underscores the importance of audience-centric evaluation, highlighting that assessing both human and AI translations requires consideration of distinct criteria relevant to specific reader profiles. The study offers insights for tailoring translation strategies to more effectively bridge cultural gaps while preserving literary value for diverse audiences.

Mark Helpful

Bookmark

Relay

View Full Paper