What question did this study set out to answer?

This study aims to evaluate how well AI models can replicate the quality of human translations in literary contexts, particularly autobiography.

March 10, 2026Open Access

Exploring AI’s performance in literary autobiography translation: how closely do AI models match human translation

Key Points

This study aims to evaluate how well AI models can replicate the quality of human translations in literary contexts, particularly autobiography.
Compared translations produced by NMT-GT and two LLMs (ChatGPT-4o and OpenAI-o1) to human translations.
Analyzed variations across linguistic dimensions including lexical and syntactic diversity.
Assessed readability and the effectiveness of textbase and situation models.
ChatGPT-4o showed the closest alignment with human translations.
NMT-GT performed better than OpenAI-o1 but not as well as ChatGPT-4o.
OpenAI-o1 exhibited the least similarity to human translations, suggesting limitations in reasoning-based models.

Abstract

AI-based models are transforming the translation industry, with tools like Google Translate’s neural machine translation (NMT-GT) and large language models (LLMs) driving progress. Yet, applying these models to literary translation, a field that remains challenging even for experienced human translators, raises important questions: How well can AI replicate the depth and nuance of human translation, and which type of AI, NMTs, general-purpose LLM, or reasoning-based LLM, better approximates human outputs? This corpus-based study investigates and compares translations by NMT-GT and two LLMs, ChatGPT-4o and OpenAI-o1, to human translations. Our analysis identifies substantial variations across multiple linguistic dimensions, including lexical and syntactic diversity, textbase and situation model, and readability. Results show that ChatGPT-4o aligns most closely with human translations in this literary autobiography case, followed by NMT-GT, while OpenAI-o1 demonstrates the least similarity. These findings suggest that NMT systems do not necessarily fall short of LLMs in approximating human translations. Reasoning-based OpenAI-o1 does not produce a more human-like translation profile than the general-purpose AI models, with ChatGPT-4o most effectively bridging the gap between human and AI-generated translations.

Bookmark

View Full Paper

Bookmark

View Full Paper

Exploring AI’s performance in literary autobiography translation: how closely do AI models match human translation

Key Points

Abstract

Cite This Study