INTRODUCTION: Artificial intelligence is gaining significant traction, particularly in the orthopedic literature. To date, there has been no published literature on the use of specific military AI tools such as NIPR-GPT. The accuracy, reliability, and clinical validity of these systems in answering common patient questions about orthopedic pathology remains unknown. The primary objective of this study was to evaluate the reliability of a military-specific AI chatbot (i.e., NIPR-GPT) as sources of patient education for anterior cruciate ligament (ACL) tears. No prior studies have evaluated military-specific generative AI platforms for orthopedic patient education. MATERIALS AND METHODS: A list of 12 frequently asked ACL related questions was compiled from the top orthopedic institutes in the United States. The questions were then subcategorized into 6 themes (2 questions per theme): "understanding the diagnosis," "nonsurgical treatment options," "surgical indications," "surgical techniques," "potential complications," and "postoperative recovery and outcomes." Each question was input into NIPR-GPT. The exact responses were recorded and then subsequently transcribed and formatted for review. A panel of 5 subspecialty trained orthopedic surgeons evaluated the responses based on the level of clarification needed to adequately answer the question. A rating of excellent was given if no clarification was required, satisfactory with minimal clarification (1-2 clarifications needed), satisfactory with moderate clarification (3-4 clarifications needed), or unsatisfactory (>4 clarifications required or inclusion of notable false/misleading information). The Interclass Correlation Coefficient (ICC) was calculated for inter-rater reliability. RESULTS: Of the 60 total ratings, the surgeons graded 27 (45%) responses as excellent, 28 (47%) as satisfactory with minimal clarification, 2 (3%) as satisfactory with moderate clarification, and 3 (5%) as unsatisfactory. Among answers requiring clarification, 75% contained inaccurate or misleading content, 32% omitted critical information, 18% were overly general, and 7% were outdated (non-exclusive categories). When sub-analyzing each theme, the highest average scores were seen in "surgical indications" (3.7/4) and "diagnosis" (3.6/4), while "surgical techniques" scored the lowest (2.7/4). The ICC for surgeon responses was 0.52, indicating moderate inter-rater reliability. CONCLUSIONS: NIPR-GPT provides good to excellent responses to frequently asked questions about ACL tears. It was particularly strong in "surgical indications" and "understanding the diagnosis" of ACL tears. This AI chatbot was weakest in explaining "surgical techniques." There was moderate agreement amongst clinicians regarding these responses which may be limited to complexity and nuance of the questions along with personal or training biases. NIPR-GPT provides a starting point for patient education in ACL tears.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rosso et al. (Wed,) studied this question.
synapsesocial.com/papers/6a2117dfd499ed480b170b50 — DOI: https://doi.org/10.1093/milmed/usag261
A. Rosso
Naval Medical Center San Diego
Harrison Diaz
Naval Medical Center San Diego
Peter Baglien
Massachusetts General Hospital
Military Medicine
Massachusetts General Hospital
Naval Medical Center San Diego
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: