What is the clinical evidence from this study?

Study design: Observational. Population: Common cardiac conditions and symptoms (n=70). Intervention: ChatGPT vs. Expert opinion and clinical course. Primary outcome: Accuracy in correctly answering cardiovascular trivia questions.

March 26, 2023Open Access

Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2)

Key Result

ChatGPT correctly answered 74% of cardiovascular trivia questions and matched actual clinical advice in 90% of straightforward patient cases, but was only 50% accurate in complex physician consultations.

Study Design

Type

Observational (n=70)

Multicenter

Structured PICO

Does ChatGPT provide accurate medical recommendations for common cardiac symptoms and conditions compared to expert opinion?

Population

50 cardiovascular trivia questions and 20 clinical case vignettes (10 patient-physician consultations and 10 general practitioner-cardiologist/expert consultations) based on primary care consultations involving symptoms of possible cardiac origin or common cardiovascular conditions from a community health center in Amsterdam, The Netherlands.

Intervention

ChatGPT (Free Research Preview version of January 30, 2023) web-based platform

Comparator

Medical expert opinion, clinical course, and guideline recommendations

Outcome

Accuracy of ChatGPT's recommendations and answers compared to the reference standard (expert opinion and clinical course)

ChatGPT shows potential as a decision support tool for straightforward cardiac questions but currently lacks the accuracy required for complex, expert-level cardiology consultations.

Limitations

Relatively small sample size
Lack of a head-to-head comparison between an AI-assisted triage tool versus usual care
ChatGPT is a probabilistic language model that may generate different outcomes for identical inputs
Potential for implicit or explicit biases in the training data

Abstract

ABSTRACT Background It is thought that ChatGPT, an advanced language model developed by OpenAI, may in the future serve as an AI-assisted decision support tool in medicine. Objective To evaluate the accuracy of ChatGPT’s recommendations on medical questions related to common cardiac symptoms or conditions. Methods We tested ChatGPT’s ability to address medical questions in two ways. First, we assessed its accuracy in correctly answering cardiovascular trivia questions (n=50), based on quizzes for medical professionals. Second, we entered 20 clinical case vignettes on the ChatGPT platform and evaluated its accuracy compared to expert opinion and clinical course. Results We found that ChatGPT correctly answered 74% of the trivia questions, with slight variation in accuracy in the domains coronary artery disease (80%), pulmonary and venous thrombotic embolism (80%), atrial fibrillation (70%), heart failure (80%) and cardiovascular risk management (60%). In the case vignettes, ChatGPT’s response matched in 90% of the cases with the actual advice given. In more complex cases, where physicians (general practitioners) asked other physicians (cardiologists) for assistance or decision support, ChatGPT was correct in 50% of cases, and often provided incomplete or inappropriate recommendations when compared with expert consultation. Conclusions Our study suggests that ChatGPT has potential as an AI-assisted decision support tool in medicine, particularly for straightforward, low-complex medical questions, but further research is needed to fully evaluate its potential.

AIに質問

Bookmark

View Full Paper