What question did this study set out to answer?

To evaluate the reporting quality of randomized trials using AI tools in medical education.

February 23, 2026Open Access

Reporting quality of randomized trials using artificial intelligence in medical education: A meta-research study applying CONSORT and CONSORT-AI

AMAna Luiza Cabrera Martimbianco EMEdgar Maquigussa MFMário Ferrari

Key Points

To evaluate the reporting quality of randomized trials using AI tools in medical education.
Reviewed reporting quality using CONSORT and CONSORT-AI guidelines.
Two reviewers independently assessed reports with a three-level scale.
Examined large language model reliability in assessing reporting completeness.
Identified discrepancies between human and AI evaluations.
Quantified agreement using Cohen’s kappa and other metrics.
Explored patterns in misclassification origins.

Abstract

This meta-research study will evaluate the reporting quality of randomized trials assessing AI-based educational interventions for medical students, using the CONSORT and CONSORT-AI guidelines. Two trained reviewers will independently evaluate all CONSORT and CONSORT-AI items using a predefined three-level reporting scale (1 = fully reported; 2 = partially reported; 3 = not reported). Discrepancies will be resolved through consensus, resulting in a reference-standard assessment. As a secondary objective, we will examine whether a large language model (ChatGPT GPT-5.1) can reliably assess reporting completeness when analyzing RCTs directly from their original PDF files. Agreement between human reviewers and the AI model will be quantified using weighted Cohen’s kappa, percent agreement, and item-level concordance analyses. Additionally, discrepant ratings will be examined to determine whether misclassification patterns originate more frequently from human reviewers or from the AI model. This project contributes to the emerging field of AI-integrated research methodology by providing the first structured evaluation of reporting quality in AI-based medical education trials and by exploring the feasibility of using large language models to support reporting appraisal. All data, materials, and analytic scripts will be shared openly to promote transparency and reproducibility.

Ask AI

Helpful

Bookmark

View Full Paper