Key points are not available for this paper at this time.
Student-generated questions (SGQs) have proven to be a meaningful learning tool, fostering advanced thinking skills in students and aiding teachers in understanding student learning progress. However, grading the quality of SGQs demands significant effort from teachers. In this study, we explore the suitability of large language models in evaluating SGQs and identify which models can effectively replace expert evaluation of practical teaching problems. We devised a five-dimension scale, using expert ratings as the gold standard, and employed Kendall's W consistency analysis to systematically compare different large language model evaluations against expert ratings from six aspects of the scale. The research confirmed the applicability of large language models (LLMs) for the evaluation of SGQs and the exceptional performance of ChatGPT 4.0, which can assist experts in evaluating SGQs. This study aims to facilitate the implementation of artificial intelligence generated content (AIGC) in education and reinforces the belief in the substantial potential of large language models for future applications and research in the field of education.
Mi et al. (Fri,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: