Formative feedback is a central component of learning and a key feature of digital learning systems. Since late 2022, instruction-tuned large language models (LLMs) have enabled scalable generation of elaborated, formative feedback for text-based student assignments across domains. Existing reviews do not adequately reflect these developments. To address this gap, we conducted a qualitative systematic literature review following PRISMA guidelines, synthesizing evidence from 47 peer-reviewed empirical studies including 121 distinct research questions, published between 2022 and 2026. The review was guided by three research questions examining (1) theoretical framings and dominant research questions, (2) methodological and contextual characteristics, and (3) technological trends and empirical findings related to feedback quality, feedback processing, and learning outcomes. The reviewed literature is primarily grounded in formative feedback theory, self-regulated learning, feedback literacy, and human–AI interaction frameworks. Methodologically, studies are dominated by short-term experiments, quasi-experimental classroom studies, and expert evaluations of feedback quality, mostly situated in higher education. Technologically, most studies employ proprietary, pre-trained GPT-3.5 or GPT-4 models with context-enriched, role-based zero-shot prompting or few-shot prompting, while fine-tuned, open-source, and multi-agent approaches are beginning to emerge. Empirical findings indicate that LLM-generated formative feedback consistently outperforms no-feedback conditions and often improves revision quality, motivation, and short-term learning outcomes, sometimes approaching teacher feedback under well-designed prompting. However, recurring risks include hallucinations, over-positivity, and misclassification of student work. Emerging evidence suggests that instruction fine-tuning, grounding prompts in student artifacts, and teacher-in-the-loop oversight can mitigate these issues.
Maier et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: