Abstract This study evaluates the performance of large language models (LLMs) in translating Old English (OE) texts into Contemporary English. We analyze translations generated by LLaMA2 and GPT-4 through their respective interfaces, Meta Llama 2 Chat and ChatGPT. Our methodology employs a human evaluation approach for three Old English excerpts: ‘The life of St. Æthelthryth’ from Ælfric′s Lives of Saints, ‘Cynewulf and Cyneheard’ from The Anglo-Saxon Chronicle, and ‘Ohthere’s voyage’ from the Old English translation of Orosius’ Historiarum adversum paganos libri septem. Through qualitative analysis focusing on morphology, syntax, and lexicon and comparison against a golden corpus of human translations, we examine the coherence, adequacy, and precision of LLM-generated translations. Our findings reveal significant variation in translation quality across different LLMs and source texts. While GPT-4 demonstrates remarkable competence in translating Old English, particularly in morphological and lexical accuracy, both models show inconsistencies with complex syntactic arrangements. The study highlights the potential of LLMs for historical language translation and emphasizes the necessity of human revision and the impact of source text complexity on translation quality. This research provides insights into the capabilities and limitations of AI in processing low-resource historical languages.
Silvia Saporta (Fri,) studied this question.