ChatGPT generated fully acceptable echocardiography reports in 85.7% of cases, with a mean total score of 6.86 and only 5.3% of parameters misinterpreted compared to expert cardiologists.
Cross-Sectional (n=21)
Does an LLM (ChatGPT) accurately generate echocardiography reports and clinical recommendations compared to standard clinical assessments?
n=21 echocardiographic cases (13 fictional, 8 clinical)
Large language model (ChatGPT) for automated generation of echocardiography reports and clinical recommendations
Standard clinical assessments conducted by experienced cardiologists
Accuracy in report generation, diagnostic precision, and appropriateness of recommendations evaluated using a dedicated scoring systemsurrogate
ChatGPT demonstrates high accuracy in generating echocardiography reports and clinical recommendations, suggesting potential utility in streamlining clinical workflows.
Accurate interpretation of echocardiography measurements is essential for diagnosing cardiovascular diseases and guiding clinical management. The emergence of large language models (LLMs) like ChatGPT presents a novel opportunity to automate the generation of echocardiography reports and provide clinical recommendations. This study aimed to evaluate the ability of an LLM (ChatGPT) to 1) generate comprehensive echocardiography reports based solely on provided echocardiographic measurements, and when enriched with clinical information 2) formulate accurate diagnoses, along with appropriate recommendations for further tests, treatment, and follow-up. Echocardiographic data from n = 13 fictional cases (Group 1) and n = 8 clinical cases (Group 2) were input into the LLM. The model's outputs were compared against standard clinical assessments conducted by experienced cardiologists. Using a dedicated scoring system, the LLM's performance was evaluated and stratified based on its accuracy in report generation, diagnostic precision, and the appropriateness of its recommendations. Patterns, frequency and examples of misinterpretations by LLM were analysed. Across all cases, mean total score was 6.86 (SD = 1.12). Group 1 had a mean total score of 6.54 (SD = 1.13) and accuracy of 3.92 (SD = 0.86), while Group 2 scored 7.38 (SD = 0.92) and 4.38 (SD = 0.92), respectively. Recommendations were 2.62 (SD = 0.51) for Group 1 and 3.00 (SD = 0.00) for Group 2, with no significant differences (p = 0.096). Fully acceptable reports were 85.7%, borderline acceptable 14.3%, and none were not acceptable. Of 299 parameters, 5.3% were misinterpreted. The LLM demonstrated a high level of accuracy in generating detailed echocardiography reports, mostly correctly identifying normal and abnormal findings, and making accurate diagnoses across a range of cardiovascular conditions. ChatGPT, as an LLM, shows significant potential in automating the interpretation of echocardiographic data, offering accurate diagnostic insights and clinical recommendations. These findings suggest that LLMs could serve as valuable tools in clinical practice, assisting and streamlining clinical workflow.
Building similarity graph...
Analyzing shared references across papers
Loading...
Finn Syryca
Christian Gräßer
Deutsches Herzzentrum München
Teresa Trenkwalder
Structural Heart Disease
The International Journal of Cardiovascular Imaging
Deutsches Herzzentrum München
Klinik und Poliklinik für Nuklearmedizin
Building similarity graph...
Analyzing shared references across papers
Loading...
Syryca et al. (Mon,) conducted a cross-sectional in Cardiovascular diseases (n=21). ChatGPT (Large Language Model) vs. Standard clinical assessments by experienced cardiologists was evaluated on Mean total score for report generation, diagnostic precision, and recommendations. ChatGPT generated fully acceptable echocardiography reports in 85.7% of cases, with a mean total score of 6.86 and only 5.3% of parameters misinterpreted compared to expert cardiologists.
synapsesocial.com/papers/6a0baa534f6759c6fca2575a — DOI: https://doi.org/10.1007/s10554-025-03382-1