The growing adoption of Generative AI in education has created opportunities to automate complex pedagogical tasks, yet reliably and scalably assessing open-ended responses remains a challenge. This study proposes and evaluates an architectural solution for integrating a Large Language Model (LLM) into Moodle, combining Retrieval-Augmented Generation (RAG) and AI agent mechanisms to enable automated grading of open-ended student responses. A Moodle instance was deployed for experimental purposes, with 32 students across Bulgarian- and English-language sections, yielding data at the student (N = 32) and task (N = 160) levels, including AI-generated and instructor-assigned scores and system processing logs. The results demonstrate that the proposed system achieves substantial reductions in grading time while maintaining high agreement with expert assessments. Bias analysis revealed minimal systematic deviation across both language groups, indicating that the system preserves assessment objectivity without consistent over- or underestimation based on language. These findings suggest that a combined RAG and agentic LLM architecture can deliver efficient, accurate, and linguistically robust automated assessment within an LMS environment, offering practical design guidelines applicable to other educational platforms and similar systems.
Vangelova et al. (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: