This meta-research study will evaluate the reporting quality of randomized trials assessing AI-based educational interventions for medical students, using the CONSORT and CONSORT-AI guidelines. Two trained reviewers will independently evaluate all CONSORT and CONSORT-AI items using a predefined three-level reporting scale (1 = fully reported; 2 = partially reported; 3 = not reported). Discrepancies will be resolved through consensus, resulting in a reference-standard assessment. As a secondary objective, we will examine whether a large language model (ChatGPT GPT-5.1) can reliably assess reporting completeness when analyzing RCTs directly from their original PDF files. Agreement between human reviewers and the AI model will be quantified using weighted Cohen’s kappa, percent agreement, and item-level concordance analyses. Additionally, discrepant ratings will be examined to determine whether misclassification patterns originate more frequently from human reviewers or from the AI model. This project contributes to the emerging field of AI-integrated research methodology by providing the first structured evaluation of reporting quality in AI-based medical education trials and by exploring the feasibility of using large language models to support reporting appraisal. All data, materials, and analytic scripts will be shared openly to promote transparency and reproducibility.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ana Luiza Cabrera Martimbianco
Edgar Maquigussa
Mário Ferrari
Building similarity graph...
Analyzing shared references across papers
Loading...
Martimbianco et al. (Tue,) studied this question.
www.synapsesocial.com/papers/699ba0a772792ae9fd870b01 — DOI: https://doi.org/10.17605/osf.io/76wvn