Key points are not available for this paper at this time.
This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.
Building similarity graph...
Analyzing shared references across papers
Loading...
Da‐Wei Zhang
Melissa Boey
Yan Tan
npj Science of Learning
Monash University Malaysia
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/6a10e59bba20d9a181ee6f26 — DOI: https://doi.org/10.1038/s41539-024-00291-1