What question did this study set out to answer?

Evaluate the performance of ChatGPT-4o and a TFDA-approved bone scintigraphy platform in interpreting bone scans for metastasis detection.

February 12, 2026Open Access

AI-assisted interpretation of bone scans: Performance comparison between ChatGPT-4o and a TFDA-approved bone scintigraphy platform in AI-driven nuclear imaging interpretation

Key Points

Evaluate the performance of ChatGPT-4o and a TFDA-approved bone scintigraphy platform in interpreting bone scans for metastasis detection.
Analyzed 52 bone scintigraphy images using three methods: board-certified physicians, ChatGPT-4o, and a TFDA-approved platform.
Performed binary classification and localized lesions across nine predefined anatomical regions.
Compared outcomes to a gold standard established by board-certified nuclear medicine physicians.
ChatGPT-4o achieved an accuracy of 84.6%, similar to 82.7% from the TFDA-approved platform.
ChatGPT-4o's regional precision was 32.5% and sensitivity was 13.3%; the platform scored 80.3% precision and 64.9% sensitivity.
Indicated that ChatGPT-4o could assist in report drafting but lacks consistency in lesion localization.

Abstract

Background With the emergence of artificial intelligence in medical imaging, large language models such as chat generative pre-trained transformer (ChatGPT)-4o have drawn much attention for their potential in diagnostic support. However, their performance in nuclear medicine applications still remains underexplored. In this study, we aimed to evaluate the Taiwan Food and Drug Administration (TFDA)-approved bone scintigraphy (BS platform) and ChatGPT-4o capability to interpret BS images for the detection and localization of bone metastases. Methods A total of 52 BS images were analyzed with three interpretation methods: board-certified physicians, ChatGPT-4o multimodal image analysis, and the BS platform. The performance of the interpretations was evaluated with both binary classification and lesion localization of nine predefined anatomical regions. These results were compared to the report of board-certified nuclear medicine physicians, which served as the gold standard in this study. Results In binary classification, ChatGPT-4o achieved an accuracy of 84.6%, similar to the performance of the BS platform's accuracy of 82.7%. However, ChatGPT-4o showed lower performance in lesion localization. Its regional precision was 32.5%, and sensitivity was 13.3%, compared to the BS platform's precision of 80.3% and sensitivity of 64.9%. Conclusion ChatGPT-4o showed preliminary potential for detecting bone metastases and assisting in structured report drafting, but its limited lesion-localization performance restricts clinical applicability. The BS platform, developed specifically for bone scintigraphy, demonstrated more consistent regional accuracy in this dataset. These results represent an early proof-of-concept comparison, suggesting feasibility for reporting support rather than clinical deployment. Larger, multi-center studies and domain-specific training will be needed to clarify large language models’ future role in nuclear medicine.

AI-assisted interpretation of bone scans: Performance comparison between ChatGPT-4o and a TFDA-approved bone scintigraphy platform in AI-driven nuclear imaging interpretation

Key Points

Abstract

Cite This Study

Also Consider

Also Consider