What question did this study set out to answer?

This research assesses how well large language models translate Old English texts into Contemporary English.

March 29, 2026

Human evaluation of large language models in Old English translation: a qualitative analysis

Key Points

This research assesses how well large language models translate Old English texts into Contemporary English.
Evaluated translations by LLaMA2 and GPT-4 using Meta Llama 2 Chat and ChatGPT interfaces.
Focused on three Old English excerpts from prominent historical texts.
Conducted qualitative analysis on morphology, syntax, and lexicon comparing against human translations.
Significant variation in translation quality was observed across different models and texts.
GPT-4 showed higher accuracy in morphological and lexical translation compared to LLaMA2.
Both models exhibited inconsistencies in translating complex syntactic structures.

Abstract

Abstract This study evaluates the performance of large language models (LLMs) in translating Old English (OE) texts into Contemporary English. We analyze translations generated by LLaMA2 and GPT-4 through their respective interfaces, Meta Llama 2 Chat and ChatGPT. Our methodology employs a human evaluation approach for three Old English excerpts: ‘The life of St. Æthelthryth’ from Ælfric′s Lives of Saints, ‘Cynewulf and Cyneheard’ from The Anglo-Saxon Chronicle, and ‘Ohthere’s voyage’ from the Old English translation of Orosius’ Historiarum adversum paganos libri septem. Through qualitative analysis focusing on morphology, syntax, and lexicon and comparison against a golden corpus of human translations, we examine the coherence, adequacy, and precision of LLM-generated translations. Our findings reveal significant variation in translation quality across different LLMs and source texts. While GPT-4 demonstrates remarkable competence in translating Old English, particularly in morphological and lexical accuracy, both models show inconsistencies with complex syntactic arrangements. The study highlights the potential of LLMs for historical language translation and emphasizes the necessity of human revision and the impact of source text complexity on translation quality. This research provides insights into the capabilities and limitations of AI in processing low-resource historical languages.

Mark Helpful

Bookmark

Relay