AI-based models are transforming the translation industry, with tools like Google Translate’s neural machine translation (NMT-GT) and large language models (LLMs) driving progress. Yet, applying these models to literary translation, a field that remains challenging even for experienced human translators, raises important questions: How well can AI replicate the depth and nuance of human translation, and which type of AI, NMTs, general-purpose LLM, or reasoning-based LLM, better approximates human outputs? This corpus-based study investigates and compares translations by NMT-GT and two LLMs, ChatGPT-4o and OpenAI-o1, to human translations. Our analysis identifies substantial variations across multiple linguistic dimensions, including lexical and syntactic diversity, textbase and situation model, and readability. Results show that ChatGPT-4o aligns most closely with human translations in this literary autobiography case, followed by NMT-GT, while OpenAI-o1 demonstrates the least similarity. These findings suggest that NMT systems do not necessarily fall short of LLMs in approximating human translations. Reasoning-based OpenAI-o1 does not produce a more human-like translation profile than the general-purpose AI models, with ChatGPT-4o most effectively bridging the gap between human and AI-generated translations.
Huang et al. (Sat,) studied this question.