What question did this study set out to answer?

This study aims to evaluate the reliability of a military-specific AI chatbot, NIPR-GPT, in answering questions about ACL injuries.

June 4, 2026Open Access

Assessing Military Artificial Intelligence Responses to Anterior Cruciate Ligament Injury Frequently Asked Questions

Read Full Paperexternally

Key Points

This study aims to evaluate the reliability of a military-specific AI chatbot, NIPR-GPT, in answering questions about ACL injuries.
Compiled 12 frequently asked ACL-related questions from top orthopedic institutes in the U.S.
Responses were evaluated by a panel of 5 orthopedic surgeons for clarity and accuracy using a rating scale.
Calculated Interclass Correlation Coefficient (ICC) for inter-rater reliability among surgeon evaluations.
Surgeons rated 27 (45%) responses as excellent and 28 (47%) satisfactory with minimal clarification.
75% of responses needing clarification contained inaccurate or misleading content.
The ICC for surgeon responses was 0.52, indicating moderate inter-rater reliability.

Abstract

INTRODUCTION: Artificial intelligence is gaining significant traction, particularly in the orthopedic literature. To date, there has been no published literature on the use of specific military AI tools such as NIPR-GPT. The accuracy, reliability, and clinical validity of these systems in answering common patient questions about orthopedic pathology remains unknown. The primary objective of this study was to evaluate the reliability of a military-specific AI chatbot (i.e., NIPR-GPT) as sources of patient education for anterior cruciate ligament (ACL) tears. No prior studies have evaluated military-specific generative AI platforms for orthopedic patient education. MATERIALS AND METHODS: A list of 12 frequently asked ACL related questions was compiled from the top orthopedic institutes in the United States. The questions were then subcategorized into 6 themes (2 questions per theme): "understanding the diagnosis," "nonsurgical treatment options," "surgical indications," "surgical techniques," "potential complications," and "postoperative recovery and outcomes." Each question was input into NIPR-GPT. The exact responses were recorded and then subsequently transcribed and formatted for review. A panel of 5 subspecialty trained orthopedic surgeons evaluated the responses based on the level of clarification needed to adequately answer the question. A rating of excellent was given if no clarification was required, satisfactory with minimal clarification (1-2 clarifications needed), satisfactory with moderate clarification (3-4 clarifications needed), or unsatisfactory (>4 clarifications required or inclusion of notable false/misleading information). The Interclass Correlation Coefficient (ICC) was calculated for inter-rater reliability. RESULTS: Of the 60 total ratings, the surgeons graded 27 (45%) responses as excellent, 28 (47%) as satisfactory with minimal clarification, 2 (3%) as satisfactory with moderate clarification, and 3 (5%) as unsatisfactory. Among answers requiring clarification, 75% contained inaccurate or misleading content, 32% omitted critical information, 18% were overly general, and 7% were outdated (non-exclusive categories). When sub-analyzing each theme, the highest average scores were seen in "surgical indications" (3.7/4) and "diagnosis" (3.6/4), while "surgical techniques" scored the lowest (2.7/4). The ICC for surgeon responses was 0.52, indicating moderate inter-rater reliability. CONCLUSIONS: NIPR-GPT provides good to excellent responses to frequently asked questions about ACL tears. It was particularly strong in "surgical indications" and "understanding the diagnosis" of ACL tears. This AI chatbot was weakest in explaining "surgical techniques." There was moderate agreement amongst clinicians regarding these responses which may be limited to complexity and nuance of the questions along with personal or training biases. NIPR-GPT provides a starting point for patient education in ACL tears.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Rosso et al. (Wed,) studied this question.

synapsesocial.com/papers/6a2117dfd499ed480b170b50 — DOI: https://doi.org/10.1093/milmed/usag261

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum· 2023 · 2,260 citations
Application of ChatGPT for Orthopedic Surgeries and Patient Care· 2024 · 37 citations
The rise of <scp>ChatGPT</scp>: Exploring its potential in medical education· 2023 · 571 citations
Can Artificial Intelligence Improve the Readability of Patient Education Materials?· 2023 · 180 citations
Overview of artificial intelligence in medicine· 2019 · 1,081 citations

Authors

A. Rosso

Naval Medical Center San Diego

Harrison Diaz

Naval Medical Center San Diego

Peter Baglien

Massachusetts General Hospital

Journals

Military Medicine

Actions

Institutions

Massachusetts General Hospital

Naval Medical Center San Diego

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Assessing Military Artificial Intelligence Responses to Anterior Cruciate Ligament Injury Frequently Asked Questions

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider