e12552 Background: Artificial intelligence (AI) has been developed as a promising assistive tool for improving diagnostic accuracy and reducing inter-reader variability in breast ultrasound interpretation. However, evidence regarding the practical impact of AI assistance on diagnostic consistency remains limited. This study aimed to evaluate the impact of AI-assisted breast ultrasound interpretation on diagnostic performance and inter-reader agreement in a multicenter, multinational setting. Methods: Between Sep 2024 and Mar 2025, 148 breast ultrasound cases were collected from Korea, Hong Kong, and Kazakhstan. The images were analyzed using an AI ultrasound system (CadAI-B), and all data were uploaded to a web-based scoring platform. Ground truth was established by three board-certified radiologists, while seven international junior physicians served as readers. Each reader evaluated in two phases: a pre-AI phase and, after a one-month washout period, an AI-assisted post-AI phase. The post-AI phase consisted of three sequential steps providing increasing levels of AI support: measurements and BI-RADS lexicons (post-AI 1), additional AI maps and malignancy scores (post-AI 2), and final BI-RADS categories (post-AI 3). Diagnostic performance and mean probability of malignancy (POM) were compared between phases, and inter-reader variability was assessed using Randolph’s free-marginal Fleiss’ kappa. Results: Mean AUC, sensitivity, and specificity were 0.816, 94.2%, and 27.7% in pre-AI phase, and 0.811, 98.8%, and 16.5% in post-AI phase, respectively. Inter-reader variability in POM was high in the pre-AI phase but was markedly reduced in the post-AI phase across all sub-phases. This trend was consistently observed in both benign (n = 84) and malignant (n = 64) cases. Among the BI-RADS descriptors, shape, orientation, and posterior features demonstrated substantial inter-reader agreement (kappa values > 0.81). Especially, margin—showing the lowest agreement in pre-AI phase—improved from a kappa value of 0.2897 to 0.6871, reaching the level of substantial agreement. Conclusions: AI-assisted breast ultrasound interpretation could reduce inter-reader variability, thereby improving diagnostic consistency and reliability. These findings provide strong evidence supporting the clinical impact of AI-based decision support systems, particularly in establishing a more standardized reading environment for reader groups with heterogeneous levels of experience. Strength of agreement and inter-reader variability of breast ultrasound descriptors in pre-AI and post-AI phases. Descriptor Pre-AI Post-AI 1 Post-AI 2 Post-AI 3 Shape 0.4826 0.8345 0.8345 0.8345 Orientation 0.4659 0.8867 0.8867 0.8867 Margin 0.2897 0.6919 0.6887 0.6871 Echo Pattern 0.5232 0.7161 0.7161 0.7161 Posterior Features 0.5281 0.8275 0.8202 0.8134
Lee et al. (Thu,) studied this question.