December 30, 2024Open Access

Evaluating large language models for criterion-based grading from agreement to consistency

Key Points

Key points are not available for this paper at this time.

Abstract

This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Da‐Wei Zhang

Melissa Boey

Yan Tan

Journals

npj Science of Learning

Actions

Institutions

Monash University Malaysia

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Evaluating large language models for criterion-based grading from agreement to consistency

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study