Microsoft Copilot identified pulmonary embolism in the top 10 differential diagnoses in 94.3% of cases and achieved a higher AUC for risk stratification than the Wells score (0.713 vs 0.583).
Observational (n=140)
Does Microsoft Copilot improve the diagnostic identification and risk assessment of pulmonary embolism compared to the Wells score in clinical vignettes?
140 clinical vignettes of adult patients (≥18 years) with suspected pulmonary embolism who underwent CTPA (70 with confirmed PE, 70 without PE), derived from published case reports within the last 10 years. Mean age 54, 54.3% female.
Microsoft Copilot (GPT-4 integration, 'precise' mode) analyzing clinical vignettes to generate a top 10 differential diagnosis list and predict the risk of pulmonary embolism.
Wells score calculated independently by two investigators based on the review of the same clinical vignettes.
Ability of Microsoft Copilot to accurately identify pulmonary embolism based on clinical data by listing it within the top 10 differential diagnosis list.
Microsoft Copilot demonstrated high accuracy in including pulmonary embolism in differential diagnoses and outperformed the Wells score in risk stratification using clinical vignettes.
Effect estimate: OR 3.41 (95% CI 1.04-11.17)
Absolute Event Rate: 94.3% vs 82.9%
INTRODUCTION: Patients with pulmonary embolism (PE) often present with non-specific signs and symptoms mimicking other conditions and complicating diagnosis. In this study we aimed to evaluate the performance of an artificial-intelligence tool, Microsoft Copilot, in the diagnostic process of PE, using clinical data including demographics, complaints, and vital signs. METHODS: We conducted this study using 140 clinical vignettes, including 70 patients with and 70 patients without PE. The vignettes were derived from published case reports within the last 10 years. We used Copilot for its free GPT-4 integration to analyze clinical data and answer two questions after each vignette. We compared Copilot's ability to identify PE within the top 10 differential diagnoses, and its ability to predict the risk of PE when compared to the use of the Wells score by two independent investigators. RESULTS: Copilot correctly included PE in the differential diagnosis in 94.3% of cases by listing it within the top 10 conditions. Risk assessment by Copilot yielded significantly higher levels in patients with PE (P.05). Copilot demonstrated better discriminatory power than the Wells score in risk assessment of PE (area under the curve 0.713 vs 0.583), with statistical significance (P<0.001 vs P=.091). Sensitivity, specificity, positive predictive value, and negative predictive value for discriminating between the combination of low- and intermediate- vs high-risk categories were 34%, 97.1%, 92.3%, and 59.6%, respectively. CONCLUSION: This study explores the potential of Copilot as a tool in clinical decision-making, demonstrating a high rate of correctly identifying PE and improved performance over the Wells score. However, further validation in larger populations and real-world settings is crucial to fully realize its potential.
Building similarity graph...
Analyzing shared references across papers
Loading...
Banu Arslan
Sağlık Bilimleri Üniversitesi
Mehmet Necmeddin Sutaşır
Ministry of Health
Ertuğrul Altınbilek
University of Health Science
Western Journal of Emergency Medicine
Ministry of Health
Şişli Etfal Eğitim ve Araştırma Hastanesi
Building similarity graph...
Analyzing shared references across papers
Loading...
Arslan et al. (Sun,) conducted a observational in Suspected Pulmonary Embolism (n=140). Microsoft Copilot vs. Wells score was evaluated on Inclusion of pulmonary embolism in the top 10 differential diagnoses (OR 3.41, 95% CI 1.04-11.17). Microsoft Copilot identified pulmonary embolism in the top 10 differential diagnoses in 94.3% of cases and achieved a higher AUC for risk stratification than the Wells score (0.713 vs 0.583).
synapsesocial.com/papers/6a1304c083732aa7db9ebdaf — DOI: https://doi.org/10.5811/westjem.24995