Large language models (LLMs) have shown potential not only as content generators but as evaluators capable of providing nuanced feedback. However, much of the current application of LLMs in education treats them as static graders rather than dynamic participants in formative assessment processes. This study explores how rubric-guided prompting and role-aware feedback simulations can enable LLMs to approximate human evaluative reasoning across dimensions critical to design-based learning. Using situated learning theory, iterative design pedagogy, and cognitive models of scientific and engineering thinking, the research developed a framework wherein LLMs were trained to align with expert judgment. A stratified sample of student design artifacts was evaluated across different roles (instructor, peer reviewer, grant reviewer) using targeted prompting. Feedback outputs were coded for tone and evaluation focus. Rubric engineering was found to substantially improve LLM-human agreement in cognitively complex categories. LLMs demonstrated role-sensitive feedback variation, and final rubric-tuned LLM ratings achieved high consistency with human ratings (Cronbach’s Alpha > 0.75). Figures and tables illustrate how role-specific emphasis and tone were reliably modulated. When properly scaffolded, LLMs can serve as dynamic co-evaluators and rubric co-design partners. These findings advance the use of AI from automation to pedagogical emulation, offering scalable, reflective feedback ecosystems for design-rich learning environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Osman Yaşar
Andriy Kashyrskyy
Charles Xie
International Journal of Artificial Intelligence in Education
Institute for the Future
Building similarity graph...
Analyzing shared references across papers
Loading...
Yaşar et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69ccb6e416edfba7beb88a3c — DOI: https://doi.org/10.1016/j.ijaied.2026.100013