March 22, 2024

A Comparative Analysis of Different Large Language Models in Evaluating Student-Generated Questions

Key Points

Key points are not available for this paper at this time.

Abstract

Student-generated questions (SGQs) have proven to be a meaningful learning tool, fostering advanced thinking skills in students and aiding teachers in understanding student learning progress. However, grading the quality of SGQs demands significant effort from teachers. In this study, we explore the suitability of large language models in evaluating SGQs and identify which models can effectively replace expert evaluation of practical teaching problems. We devised a five-dimension scale, using expert ratings as the gold standard, and employed Kendall's W consistency analysis to systematically compare different large language model evaluations against expert ratings from six aspects of the scale. The research confirmed the applicability of large language models (LLMs) for the evaluation of SGQs and the exceptional performance of ChatGPT 4.0, which can assist experts in evaluating SGQs. This study aims to facilitate the implementation of artificial intelligence generated content (AIGC) in education and reinforces the belief in the substantial potential of large language models for future applications and research in the field of education.

Bookmark

Cite This Study

Mi et al. (Fri,) studied this question.

synapsesocial.com/papers/68e72cd4b6db6435876a6258 https://doi.org/https://doi.org/10.1109/iceit61397.2024.10540914

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark