What question did this study set out to answer?

The research aims to develop an AI framework that generates examination questions from marking schemes using natural language processing techniques.

March 28, 2026Open Access

AI-Powered Natural Language Processing Framework for Reverse-Engineering Examination Questions from Marking Schemes

Key Points

The research aims to develop an AI framework that generates examination questions from marking schemes using natural language processing techniques.
Developed a reverse-engineering framework utilizing transformer-based generative modeling.
Encoded marking schemes with MPNet embeddings for question generation.
Decoding performed using a T5-small model with semantic reconstruction.
Evaluated the framework on 7021 marking schemes from Sol Plaatje University.
Achieved BLEU score of 0.71 and ROUGE-L score of 0.68, indicating strong question generation performance.
Demonstrated high reconstruction fidelity at 0.84 and Bloom-level accuracy of 0.79.
Exhibited superior performance compared to baseline models, including both unconstrained T5 and rule-based methods.

Abstract

The generation of examination questions from examiner-provided marking schemes remains a critical yet underexplored challenge in automated assessment. This study proposes an AI-powered natural language processing (NLP) framework that reverse-engineers exam questions using transformer-based generative modeling, semantic reconstruction, and pedagogical constraints. Marking schemes are encoded with MPNet embeddings and decoded into candidate questions by a T5-small model, with a reconstruction module ensuring semantic fidelity and Bloom-level embeddings enforcing cognitive alignment. Evaluation on a dataset of 7021 marking schemes from Sol Plaatje University demonstrated strong performance, with BLEU = 0.71, ROUGE-L = 0.68, METEOR = 0.65, reconstruction fidelity = 0.84, and Bloom-level accuracy = 0.79. Comparative baselines, including an unconstrained T5 (BLEU = 0.62, RF = 0.68, Bloom = 0.56) and rule-based methods (BLEU = 0.48, RF = 0.51, Bloom = 0.43), confirmed the effectiveness of the proposed approach. The results indicate that the framework generates questions that are semantically accurate, structurally coherent, and pedagogically valid, offering a scalable solution for adaptive assessment, digital archiving, and automated exam construction.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Olaniyan et al. (Thu,) studied this question.

synapsesocial.com/papers/69c771dd8bbfbc51511e1e1b https://doi.org/https://doi.org/10.3390/computers15040204

Bookmark

View Full Paper