Preserving ancient languages is essential for understanding the cultural and linguistic heritage of humanity. Old English, however, remains critically under-resourced, which limits its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts to address this gap. In this study, we specifically employ state-of-the-art models, including Llama-3.1-8B and Mistral-7B, as our foundation models, which are then adapted to the unique characteristics of Old English. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation (LoRA)), data augmentation via back-translation, and a dual-agent pipeline that separates content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment confirms high grammatical accuracy and stylistic fidelity in the generated texts, with average scores of 9.0/10 for inflection and word order, 9.1/10 for lexical authenticity, and 7.8 for semantic coherence. These results demonstrate that the framework can reliably expand limited historical corpora while maintaining linguistic integrity, with immediate practical applications in digital humanities research, computational philology, and the development of educational resources for Old English study. Beyond expanding the Old English corpus, our method offers a practical blueprint for revitalizing other endangered languages, thus linking AI innovation with the goals of cultural preservation.
Alva et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: