Key points are not available for this paper at this time.
This research evaluated ChatGPT's potential as a tool for grading programming tasks, exploring its capability to understand and assess code quality. The study took place over a 15-week Python programming course with 67 students of the Cognitive Science program. Nine different assignments were assessed by both a teacher and the ChatGPT system, and the grading differences were recorded. The teacher's grades were higher than those generated by ChatGPT. Despite this, there was a strong positive correlation between these grades, suggesting consensus agreement in grading. Nonetheless, the repeatability of ChatGPT's evaluations was excellent, and the observed differences in successive evaluations during grading iterations were negligible. The study concludes that ChatGPT could be a beneficial tool for grading programming assignments, providing several advantages such as time efficiency, quality assessment, unbiased grading, enforcement of coding standards, and the ability to generate feedback. However, the system has limitations such as cost, potential hallucinations, lack of absolute agreement reproducible results, and the occasional need for teacher intervention. The study suggests that the artificial intelligence model could complement or even substitute human grading but requires careful usage and potential verification by a human teacher.
Building similarity graph...
Analyzing shared references across papers
Loading...
Marcin Jukiewicz
Thinking Skills and Creativity
Adam Mickiewicz University in Poznań
Building similarity graph...
Analyzing shared references across papers
Loading...
Marcin Jukiewicz (Wed,) studied this question.
www.synapsesocial.com/papers/68e720ddb6db64358769aebb — DOI: https://doi.org/10.1016/j.tsc.2024.101522
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: