What question did this study set out to answer?

The research aims to explore the role of LLMs in educational assessments and their capacity for nuanced feedback.

April 1, 2026Open Access

From Evaluation to Emulation: LLMs as Agents of Iterative Pedagogical Design

Key Points

The research aims to explore the role of LLMs in educational assessments and their capacity for nuanced feedback.
Developed a framework using situated learning theory and cognitive models.
Evaluated student design artifacts using rubric-guided prompting across various roles.
Coded feedback outputs for tone and focus on evaluation categories.
Measured LLM-human agreement with Cronbach’s Alpha.
Rubric engineering improved LLM-human agreement in complex evaluation categories.
LLMs produced role-sensitive variations in feedback.
Final ratings from rubric-tuned LLMs showed high consistency with human ratings.

Abstract

Large language models (LLMs) have shown potential not only as content generators but as evaluators capable of providing nuanced feedback. However, much of the current application of LLMs in education treats them as static graders rather than dynamic participants in formative assessment processes. This study explores how rubric-guided prompting and role-aware feedback simulations can enable LLMs to approximate human evaluative reasoning across dimensions critical to design-based learning. Using situated learning theory, iterative design pedagogy, and cognitive models of scientific and engineering thinking, the research developed a framework wherein LLMs were trained to align with expert judgment. A stratified sample of student design artifacts was evaluated across different roles (instructor, peer reviewer, grant reviewer) using targeted prompting. Feedback outputs were coded for tone and evaluation focus. Rubric engineering was found to substantially improve LLM-human agreement in cognitively complex categories. LLMs demonstrated role-sensitive feedback variation, and final rubric-tuned LLM ratings achieved high consistency with human ratings (Cronbach’s Alpha > 0.75). Figures and tables illustrate how role-specific emphasis and tone were reliably modulated. When properly scaffolded, LLMs can serve as dynamic co-evaluators and rubric co-design partners. These findings advance the use of AI from automation to pedagogical emulation, offering scalable, reflective feedback ecosystems for design-rich learning environments.

Read Full Paperexternally

AI에게 질문

Bookmark

View Full Paper