What type of study is this?

This is a Literature Review study.

August 17, 2025Open Access

NLP-Based Restoration of Damaged Student Essay Archives for Educational Preservation and Fair Reassessment

Key Points

The proposed NLP-based framework improved the restoration of damaged student essays, significantly aiding archival efforts.
Evaluation metrics including ROUGE-L, BLEU-4, and BERTScore showed marked improvements over baseline models like BERT and GPT-2.
A synthetic dataset of 5000 samples was utilized to train a T5-based encoder–decoder model for the restoration task.
The findings highlight a scalable solution for educational institutions to recover valuable student essay records.

Abstract

The degradation of physical student examination archives, particularly handwritten essay booklets, presents a significant barrier to longitudinal academic research, institutional record preservation, and student performance analysis. This study introduces a novel natural language processing (NLP)-based framework for the automated reconstruction of damaged academic essay manuscripts using a span-infilling transformer architecture. A synthetic dataset comprising 5000 paired samples of damaged Text and full Text was curated from archived Data Science examination scripts collected at the Center for Applied Data Science, Sol Plaatje University, South Africa. The proposed method fine-tunes a T5-based encoder–decoder model, leveraging span corruption and task-specific prompting to restore missing or illegible segments. Comprehensive evaluation using ROUGE-L, BLEU-4, and BERTScore demonstrates substantial improvements over baseline models including BERT and GPT-2. Qualitative assessments by academic experts further validate the fluency, coherence, and contextual relevance of restored texts. Training dynamics reveal stable convergence without overfitting, while ablation studies confirm the contribution of each architectural component. Token-level error analyses and confidence-scored predictions provide additional interpretability. The proposed framework offers a scalable and effective solution for educational institutions seeking to digitize and recover lost historical student essay records, with potential extensions to other domains, such as digital humanities and archival restoration.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper