What question did this study set out to answer?

This research aims to assess the effectiveness of ChatGPT in grading various pharmacy student assessments while preserving quality.

June 17, 2026Open Access

Bridging technology and education: The use of ChatGPT in grading pharmacy student exams

Key Points

This research aims to assess the effectiveness of ChatGPT in grading various pharmacy student assessments while preserving quality.
Mixed-methods study evaluating ChatGPT-5 on 21 assessment questions
Comparison of AI grading with human grading as the gold standard
Data collected via REDCap and analyzed using descriptive and comparative statistics.
AI showed near-perfect accuracy for multiple-choice questions with a high agreement rate
Performance decreased for listing, short-answer, and essay questions
Rubric use improved accuracy for listing questions but not for short-answer or essay items.

Abstract

INTRODUCTION: Artificial intelligence (AI) tools are increasingly being explored for educational applications. This study evaluates the use of ChatGPT, an artificial intelligence language model, in grading various types of assessment questions administered to pharmacy students, with the goal of improving efficiency while maintaining the quality and integrity of student assessment. METHODS: This mixed-methods study evaluated the accuracy and consistency of ChatGPT-5 in grading 21 total multiple-choice (MC), listing, select-all-that-apply (SATA), fill-in-the-blank, short-answer, and essay questions administered to 16 pharmacy students during their adult medicine advanced pharmacy practice experience (APPE) rotations. ChatGPT-5 graded responses under specific conditions such as grading the full exam versus by question type, each with and without a rubric, and results were compared with human grading (gold standard) to assess agreement and reliability. All student data was de-identified, collected through REDCap, and analyzed using descriptive and comparative statistics. RESULTS: Sixteen students completed a 21-item mixed-format examination graded by faculty (gold standard) and AI, with and without a rubric. AI demonstrated near-perfect accuracy and concordance for objective items. Performance declined for listing, short-answer, and essay questions. Providing a rubric did not consistently improve accuracy or agreement in mixed-format grading. When responses were grouped by question type, rubric use improved accuracy and concordance for listing questions but not for short-answer or essay items, where rubric-free grading often showed higher agreement. CONCLUSIONS: AI grading reliability is highly dependent on question type and grading context. AI performs well for objective assessments, while rubric effectiveness is context-specific and limited for subjective responses.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper