What question did this study set out to answer?

This research aimed to create a framework validated by diabetologists to assess the clinical effectiveness of AI systems in managing type 2 diabetes.

June 7, 2026

1993-P: A Diabetologist-Validated Framework for Assessing Real-World Clinical Effectiveness of AI Systems in Type 2 Diabetes Care

Key Points

This research aimed to create a framework validated by diabetologists to assess the clinical effectiveness of AI systems in managing type 2 diabetes.
Developed a Donabedian model-based framework to assess clinical decision-making for T2D care.
Conducted two Delphi rounds with 12 diabetologists followed by a consensus review led by 3 senior diabetologists.
Evaluated comprehensiveness and clarity of the framework using a 4-point scale and gathered feedback for revisions.
Generated 102 comments for item-level revisions focusing on validity, clarity, and coverage.
Streamlined evaluation items from 56 to 29, enhancing clarity and comprehensiveness.
Content validity index improved from 64.4%/51.1% to 100% for comprehensiveness and clarity in the final round.

Abstract

Introduction and Objective: Current medical AI benchmarks rely on single-best-answer exam accuracy (e.g. USMLE), but real-world type 2 diabetes (T2D) care involves context-dependent clinical decisions with acceptable practice variability. To discriminate real-world clinical effectiveness of AI systems, we aimed to develop a diabetologist-validated framework in T2D management. Methods: We devised a Donabedian model-based framework to assess AI clinical decision capability by evaluating clinical reasoning for patient triage/problem list, medication recommendation, treatment strategy, dose adjustment, and monitoring/education. Meta-evaluation items embedded at the end of each phase assessed the framework’s ability to discriminate the clinical effectiveness of AI systems. Reviewers rated comprehensiveness (coverage of required elements in T2D care) and clarity (unambiguous interpretation and application) on a 4-point scale, and provided free-text feedback to inform between-round revisions. 12 diabetologists completed two initial Delphi rounds; 3 senior diabetologists led the final consensus review. Results: Delphi rounds 1-2 generated 102 item-level revision comments spanning validity, clarity, coverage, feasibility, and traceability. Iterative revisions streamlined the framework from 56 to 29 evaluation items by removing redundancy and sharpening workflow-aligned criteria, while increasing content validity index from 64.4%/51.1% (comprehensiveness/clarity) in the initial round to 100%/100% in the final round. Conclusion: This diabetologist consensus-validated framework provides explicit standards to systematically assess AI-generated T2D treatment recommendations across reasoning reliability, clinical utility, and real-world feasibility. The framework demonstrates potential to serve as an evaluative benchmark for distinguishing AI systems that effectively support diabetologists' treatment decision-making. Disclosure S. Baek: None. J. Kim: None. S. Jin: None. G. Kim: None. Y. Lee: None. J. Kim: None. S. Cho: None. R. Oh: None. B. Kim: None. M. Jang: None. S. Ko: None. M. Moon: None. K. Kim: None. K. Hur: None. Funding Future Medicine 2030 Project of the Samsung Medical Center (#SMX1250111); The Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2024-00357879)

Bookmark

Cite This Study

BAEK et al. (Fri,) studied this question.

synapsesocial.com/papers/6a250bca7def13d035e1bc2a https://doi.org/https://doi.org/10.2337/db26-1993-p

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark