INTRODUCTION: Artificial intelligence (AI) tools are increasingly being explored for educational applications. This study evaluates the use of ChatGPT, an artificial intelligence language model, in grading various types of assessment questions administered to pharmacy students, with the goal of improving efficiency while maintaining the quality and integrity of student assessment. METHODS: This mixed-methods study evaluated the accuracy and consistency of ChatGPT-5 in grading 21 total multiple-choice (MC), listing, select-all-that-apply (SATA), fill-in-the-blank, short-answer, and essay questions administered to 16 pharmacy students during their adult medicine advanced pharmacy practice experience (APPE) rotations. ChatGPT-5 graded responses under specific conditions such as grading the full exam versus by question type, each with and without a rubric, and results were compared with human grading (gold standard) to assess agreement and reliability. All student data was de-identified, collected through REDCap, and analyzed using descriptive and comparative statistics. RESULTS: Sixteen students completed a 21-item mixed-format examination graded by faculty (gold standard) and AI, with and without a rubric. AI demonstrated near-perfect accuracy and concordance for objective items. Performance declined for listing, short-answer, and essay questions. Providing a rubric did not consistently improve accuracy or agreement in mixed-format grading. When responses were grouped by question type, rubric use improved accuracy and concordance for listing questions but not for short-answer or essay items, where rubric-free grading often showed higher agreement. CONCLUSIONS: AI grading reliability is highly dependent on question type and grading context. AI performs well for objective assessments, while rubric effectiveness is context-specific and limited for subjective responses.
Falahat et al. (Sun,) studied this question.