What question did this study set out to answer?

The central aim is to develop a feedback system that uses large language models to evaluate student programming submissions.

March 25, 2026Open Access

A lightweight intelligent feedback system for student code submissions using large language models

Key Points

The central aim is to develop a feedback system that uses large language models to evaluate student programming submissions.
Developed a feedback system based on large language models that assess Python coding tasks.
Analyzed submissions by comparing them to a reference solution across logic, style, and performance dimensions.
Employed semantic similarity measures through sentence embeddings for feedback generation.
Tested on thirty-one student submissions, both ideal and imperfect.
Achieved a mean similarity score of 0.56 with a standard deviation of 0.19 among submissions.
Found a moderate inverse correlation (r = -0.65) between the length of feedback provided and submission similarity.
Demonstrated adaptive feedback behavior based on submission quality through visual analysis and clustering.

Abstract

This study introduces an intelligent, large language model (LLM)–driven feedback system designed to assess and enhance students’ programming tasks through semantic comparison and pedagogically contextualized feedback. Unlike traditional grading systems, our system analyzes Python submissions against a reference solution and generates feedback along three main dimensions: logic, style, and performance. The system employs sentence-embedding-based semantic similarity to determine alignment and adaptively adjusts the feedback based on submission quality. Thirty-one student solutions (both reference-level and imperfect submissions) were tested in this study. The results show a mean similarity score of 0.56 (SD = 0.19) and a moderate inverse correlation (r = − 0.65) between feedback length and similarity, confirming adaptive behavior in the system. Visual examination, such as the category-based distribution of feedback, similarity patterns, and solution clustering, further demonstrates the validity and explainability of the system. This approach ensures reproducibility through the transparent definition of reference tasks, embedded similarity scoring and qualitative pattern analysis. The system has implications for AI-facilitated formative feedback, mass code assessment, and adaptive tutoring in computer science education.

Bookmark

View Full Paper

Bookmark

View Full Paper

A lightweight intelligent feedback system for student code submissions using large language models

Key Points

Abstract

Cite This Study