Objective To assess the clinical utility of artificial intelligence (AI) models (ChatGPT-4o, DeepSeek-R1, Grok-3 and Claude-3.7) in aligning with international guidelines for diabetic foot infection (DFI) management. Background AI systems have demonstrated their potential application value in numerous fields. However, the specific effects of these technologies in the medical and health sector still require in-depth exploration. DFI is a relatively common and serious complication among diabetic patients, and the accurate transmission of relevant information is of great significance. Therefore, it is particularly important to evaluate whether artificial intelligence can serve as an effective clinical auxiliary tool. Methods Responses from ChatGPT-4o, DeepSeek-R1, Grok-3 and Claude-3.7 were evaluated against DFI guidelines using four clinical dimensions (Accuracy, Overconclusiveness, Supplementary Value, and Completeness) using a 5-point Likert scale, and assessed for readability using Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL). Statistical analyses included ANOVA and post hoc comparisons. Results No significant differences were found across models for Accuracy and Overconclusiveness ( p 0.05). However, Supplementary Value differed significantly ( p 0.001), the performance of Grok-3 is superior to that of ChatGPT-4o ( p 0.0001), DeepSeek-R1 ( p =0.003), and Claude-3.7 ( p 0.0001). Meanwhile, there are significant differences in terms of Completeness ( p =0.005), Grok-3 outperforms ChatGPT-4o ( p =0.016)and Claude-3.7 ( p =0.010) significantly.Readability also varied: DeepSeek-R1 responses were more complex than ChatGPT-4o ( p = 0.046). Conclusion All models perform comparably in terms of accuracy and in avoiding over-conclusions. Grok-3 outperformed the other models in the dimensions of complementarity and completeness. DeepSeek-R1 generated the most complex text. These findings validate the feasibility of AI in the standardized management of DFI, but the models still need to be further verified through clinical trials to determine their value in the real-world decision-making process.
Wu et al. (Tue,) studied this question.