What type of study is this?

This is a Quasi-Experimental study (also classified as: Systematic Review).

What question did this study set out to answer?

This research examines the impact of LLM-generated formative feedback on learning outcomes in educational settings.

May 13, 2026Open Access

LLM-generated formative feedback in education: A qualitative systematic literature review

Key Points

This research examines the impact of LLM-generated formative feedback on learning outcomes in educational settings.
Qualitative systematic literature review conducted according to PRISMA guidelines
Synthesis of 47 peer-reviewed empirical studies covering multiple research questions
Focus on theoretical frameworks, methodology, and empirical findings
LLM-generated feedback consistently outperforms no-feedback conditions
Improvements seen in revision quality, motivation, and short-term learning outcomes
Feedback quality issues include hallucinations and misclassifications, mitigated by teacher oversight

Abstract

Formative feedback is a central component of learning and a key feature of digital learning systems. Since late 2022, instruction-tuned large language models (LLMs) have enabled scalable generation of elaborated, formative feedback for text-based student assignments across domains. Existing reviews do not adequately reflect these developments. To address this gap, we conducted a qualitative systematic literature review following PRISMA guidelines, synthesizing evidence from 47 peer-reviewed empirical studies including 121 distinct research questions, published between 2022 and 2026. The review was guided by three research questions examining (1) theoretical framings and dominant research questions, (2) methodological and contextual characteristics, and (3) technological trends and empirical findings related to feedback quality, feedback processing, and learning outcomes. The reviewed literature is primarily grounded in formative feedback theory, self-regulated learning, feedback literacy, and human–AI interaction frameworks. Methodologically, studies are dominated by short-term experiments, quasi-experimental classroom studies, and expert evaluations of feedback quality, mostly situated in higher education. Technologically, most studies employ proprietary, pre-trained GPT-3.5 or GPT-4 models with context-enriched, role-based zero-shot prompting or few-shot prompting, while fine-tuned, open-source, and multi-agent approaches are beginning to emerge. Empirical findings indicate that LLM-generated formative feedback consistently outperforms no-feedback conditions and often improves revision quality, motivation, and short-term learning outcomes, sometimes approaching teacher feedback under well-designed prompting. However, recurring risks include hallucinations, over-positivity, and misclassification of student work. Emerging evidence suggests that instruction fine-tuning, grounding prompts in student artifacts, and teacher-in-the-loop oversight can mitigate these issues.

LLM-generated formative feedback in education: A qualitative systematic literature review

Key Points

Abstract

Cite This Study

Also Consider

Also Consider