What type of study is this?

September 10, 2025

Performance of ChatGPT and DeepSeek in the Management of Postprostatectomy Uri-nary Incontinence.

Puntos clave

ChatGPT achieved a higher accuracy of 95% compared to DeepSeek's 72.5% in answering questions about PPUI management.
In conceptual questions, ChatGPT scored 9.0 while DeepSeek scored 8.0, indicating close performance in this area.
ChatGPT outperformed DeepSeek in case-based scenarios, scoring 10.0 versus DeepSeek's 6.5.
Both models offer valuable insights but should be used with expert oversight to ensure safe clinical application.

Resumen

Artificial intelligence (AI) continues to evolve as a tool in clinical decision support. Large language models (LLMs), such as ChatGPT and DeepSeek, are increasingly used in medicine to provide fast, accessible information. This study aimed to compare the performance of ChatGPT and DeepSeek in generating recommendations for the management of postprostatectomy urinary incontinence (PPUI), based on the AUA/SUFU guideline. A total of 20 questions (10 conceptual and 10 case-based) were developed by three urologists with expertise in PPUI, following the AUA/SUFU guideline. Each question was submitted in English using zero-shot prompting to ChatGPT-4o and DeepSeek R1. Responses were limited to 200 words and graded independently as correct (1 point), partially correct (0.5), or incorrect (0). Total and domain-specific scores were compared. ChatGPT achieved 19 out of 20 points (95.0%), while DeepSeek scored 14.5 (72.5%; p = 0.031). In conceptual questions, scores were 9.0 (ChatGPT) and 8.0 (DeepSeek; p = 0.50). In case-based scenarios, ChatGPT scored 10.0 versus 6.5 for DeepSeek (p = 0.08). ChatGPT outperformed DeepSeek across all guideline domains. DeepSeek made critical errors in the treatment domain, such as recommending a male sling for radiated patients. ChatGPT demonstrated superior performance in providing guideline-based recommendations for PPUI. However, both models should be used under expert supervision, and future research is needed to optimize their safe integration into clinical workflows.

Preguntar a la IA

Me gusta

Guardar