What question did this study set out to answer?

This framework aims to generate high-quality Old English texts using advanced AI techniques, addressing resource limitations.

May 9, 2026Open Access

AI-Driven Generation of Old English: A Framework for Low-Resource Languages

Key Points

This framework aims to generate high-quality Old English texts using advanced AI techniques, addressing resource limitations.
Utilized large language models Llama-3.1-8B and Mistral-7B as foundation models.
Applied parameter-efficient fine-tuning with Low-Rank Adaptation (LoRA) and back-translation for data augmentation.
Implemented a dual-agent pipeline for content generation and translation from English to Old English.
BLEU scores for English-to-Old English translation improved from 26 to over 65, indicating significant enhancement.
Expert human assessments averaged 9.0/10 for inflection and word order, 9.1/10 for lexical authenticity.
Achieved a score of 7.8 for semantic coherence, demonstrating high-quality text generation.

Abstract

Preserving ancient languages is essential for understanding the cultural and linguistic heritage of humanity. Old English, however, remains critically under-resourced, which limits its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts to address this gap. In this study, we specifically employ state-of-the-art models, including Llama-3.1-8B and Mistral-7B, as our foundation models, which are then adapted to the unique characteristics of Old English. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation (LoRA)), data augmentation via back-translation, and a dual-agent pipeline that separates content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment confirms high grammatical accuracy and stylistic fidelity in the generated texts, with average scores of 9.0/10 for inflection and word order, 9.1/10 for lexical authenticity, and 7.8 for semantic coherence. These results demonstrate that the framework can reliably expand limited historical corpora while maintaining linguistic integrity, with immediate practical applications in digital humanities research, computational philology, and the development of educational resources for Old English study. Beyond expanding the Old English corpus, our method offers a practical blueprint for revitalizing other endangered languages, thus linking AI innovation with the goals of cultural preservation.

AI-Driven Generation of Old English: A Framework for Low-Resource Languages

Key Points

Abstract

Cite This Study

Also Consider

Also Consider