What type of study is this?

September 10, 2025Open Access

Assessing DeepSeek-R1 for Clinical Decision Support in Multidisciplinary Laboratory Medicine

Key Points

DeepSeek-R1 achieved a diagnostic accuracy of 72.9% across 100 clinical cases, indicating promising potential for clinical applications.
The model's performance varied significantly, with the highest accuracy for diagnostic hypotheses at 85.7%, showcasing its strengths in this area.
Analysis included 100 clinical cases from the Clinical Laboratory Medicine Case Studies, demonstrating a structured evaluation approach.
Improvements in the model are suggested through expanded training data and integration with clinical ontologies to enhance real-world usage.

Abstract

Recent advancements in artificial intelligence (AI), particularly with large language models (LLMs), are transforming healthcare by enhancing diagnostic decision-making and clinical workflows. The application of LLMs like DeepSeek-R1 in clinical laboratory medicine demonstrates potential for improving diagnostic accuracy, supporting decision-making, and optimizing patient care. This study evaluates the performance of DeepSeek-R1 in analyzing clinical laboratory cases and assisting with medical decision-making. The focus is on assessing its accuracy and completeness in generating diagnostic hypotheses, differential diagnoses, and diagnostic workups across diverse clinical cases. We analyzed 100 clinical cases from Clinical Laboratory Medicine Case Studies, which includes comprehensive case histories and laboratory findings. DeepSeek-R1 was queried independently for each case three times, with three specific questions regarding diagnosis, differential diagnoses, and diagnostic tests. The outputs were assessed for accuracy and completeness by senior clinical laboratory physicians. DeepSeek-R1 achieved an overall accuracy of 72.9% (95% CI 69.9%, 75.7%) and completeness of 73.4% (95% CI 70.5%, 76.2%). Performance varied by question type: the highest accuracy was observed for diagnostic hypotheses (85.7%, 95% CI 81.2%, 89.2%) and the lowest for differential diagnoses (55.0%, 95% CI 49.3%, 60.5%). Notable variations in performance were also seen across disease categories, with the best performance observed in genetic and obstetric diagnostics (accuracy 93.1%, 95% CI 84.0%, 97.3%; completeness 86.1%, 95% CI 76.4%, 92.3%). DeepSeek-R1 demonstrates potential for a decision-support tool in clinical laboratory medicine, particularly in generating diagnostic hypotheses and recommending diagnostic workups. However, its performance in differential diagnosis and handling specific clinical nuances remains limited. Future work should focus on expanding training data, integrating clinical ontologies, and incorporating physician feedback to improve real-world applicability. DeepSeek-R1 and the new versions under development may be promising tools for non-medical professionals and professionals in medical laboratory diagnoses.

Read Full Paperexternally

AI से पूछें

Bookmark

View Full Paper