What type of study is this?

This is a Quantitative Study study.

September 17, 2025

Promoting LLMs for Breast Cancer TNM Staging Using Radiology Reports: Comparing Different Prompts and Models

Key Points

ChatGPT 4.0 demonstrated superior performance with an AUC of 0.89 in few-shot learning of TNM staging.
Few-shot learning significantly improved performance across all models, especially Google Bard with a 14.8 percentage point increase.
Intra- and inter-LLM agreement, accuracy, and AUC were assessed using 745 DCE-MRI reports.
This study suggests LLMs could enhance diagnostic efficiency and accuracy in radiology.

Abstract

Motivation: The potential of large language models (LLMs) in automating complex medical tasks, such as TNM staging from breast cancer DCE-MRI reports, remains unexplored. Goal(s): To evaluate and compare the effectiveness of ChatGPT 4.0, ChatGPT 3.5, and Google Bard in automating TNM staging using zero-shot and few-shot learning approaches. Approach: We analyzed 745 DCE-MRI reports using different LLMs and learning strategies, assessing intra- and inter-LLM agreement, accuracy, and AUC. Results: ChatGPT 4.0 demonstrated superior performance (AUC: 0.89 in few-shot learning) compared to other models. Few-shot learning significantly improved all models' performance, with Bard showing the largest improvement (14.8 percentage points increase in AUC). Impact: This study demonstrates the potential of LLMs, especially ChatGPT 4.0, in automating breast cancer TNM staging from DCE-MRI reports. The effectiveness of few-shot learning suggests a pathway for rapid adaptation of AI in radiology, potentially enhancing diagnostic efficiency and accuracy.

Demander à l'IA

Bookmark