What question did this study set out to answer?

This study aims to assess the effectiveness of artificial intelligence in enhancing diagnostic performance and reducing inter-reader variability during breast ultrasound interpretation.

May 29, 2026

Impact of artificial intelligence on inter-reader variability in breast ultrasound interpretation: An international multireader multicase study.

Key Points

This study aims to assess the effectiveness of artificial intelligence in enhancing diagnostic performance and reducing inter-reader variability during breast ultrasound interpretation.
Collected 148 breast ultrasound cases from Korea, Hong Kong, and Kazakhstan between Sep 2024 and Mar 2025.
Readers evaluated cases in two phases: pre-AI and AI-assisted, after a one-month washout period.
Employed Randolph’s free-marginal Fleiss’ kappa for inter-reader variability assessment.
Mean AUC in the pre-AI phase was 0.816 with 94.2% sensitivity and 27.7% specificity; post-AI phase showed AUC 0.811 with 98.8% sensitivity and 16.5% specificity.
Inter-reader variability significantly decreased in the post-AI phase across all sub-phases.
Agreement for the descriptor 'margin' improved from kappa 0.2897 in pre-AI to 0.6871 in post-AI, indicating substantial agreement.

Abstract

e12552 Background: Artificial intelligence (AI) has been developed as a promising assistive tool for improving diagnostic accuracy and reducing inter-reader variability in breast ultrasound interpretation. However, evidence regarding the practical impact of AI assistance on diagnostic consistency remains limited. This study aimed to evaluate the impact of AI-assisted breast ultrasound interpretation on diagnostic performance and inter-reader agreement in a multicenter, multinational setting. Methods: Between Sep 2024 and Mar 2025, 148 breast ultrasound cases were collected from Korea, Hong Kong, and Kazakhstan. The images were analyzed using an AI ultrasound system (CadAI-B), and all data were uploaded to a web-based scoring platform. Ground truth was established by three board-certified radiologists, while seven international junior physicians served as readers. Each reader evaluated in two phases: a pre-AI phase and, after a one-month washout period, an AI-assisted post-AI phase. The post-AI phase consisted of three sequential steps providing increasing levels of AI support: measurements and BI-RADS lexicons (post-AI 1), additional AI maps and malignancy scores (post-AI 2), and final BI-RADS categories (post-AI 3). Diagnostic performance and mean probability of malignancy (POM) were compared between phases, and inter-reader variability was assessed using Randolph’s free-marginal Fleiss’ kappa. Results: Mean AUC, sensitivity, and specificity were 0.816, 94.2%, and 27.7% in pre-AI phase, and 0.811, 98.8%, and 16.5% in post-AI phase, respectively. Inter-reader variability in POM was high in the pre-AI phase but was markedly reduced in the post-AI phase across all sub-phases. This trend was consistently observed in both benign (n = 84) and malignant (n = 64) cases. Among the BI-RADS descriptors, shape, orientation, and posterior features demonstrated substantial inter-reader agreement (kappa values > 0.81). Especially, margin—showing the lowest agreement in pre-AI phase—improved from a kappa value of 0.2897 to 0.6871, reaching the level of substantial agreement. Conclusions: AI-assisted breast ultrasound interpretation could reduce inter-reader variability, thereby improving diagnostic consistency and reliability. These findings provide strong evidence supporting the clinical impact of AI-based decision support systems, particularly in establishing a more standardized reading environment for reader groups with heterogeneous levels of experience. Strength of agreement and inter-reader variability of breast ultrasound descriptors in pre-AI and post-AI phases. Descriptor Pre-AI Post-AI 1 Post-AI 2 Post-AI 3 Shape 0.4826 0.8345 0.8345 0.8345 Orientation 0.4659 0.8867 0.8867 0.8867 Margin 0.2897 0.6919 0.6887 0.6871 Echo Pattern 0.5232 0.7161 0.7161 0.7161 Posterior Features 0.5281 0.8275 0.8202 0.8134

Bookmark

Impact of artificial intelligence on inter-reader variability in breast ultrasound interpretation: An international multireader multicase study.

Key Points

Abstract

Cite This Study