What question did this study set out to answer?

The aim is to develop an interpretable model for early breast cancer diagnosis that enhances accuracy and reporting.

April 26, 2026Open Access

Multiparameter concept-based interpretable model for early breast cancer diagnosis and structured reporting: a multi-center, multi-reader, radiologist-in-the-loop study

Key Points

The aim is to develop an interpretable model for early breast cancer diagnosis that enhances accuracy and reporting.
Retrospective collection of MRI images and reports from five institutions.
Integration of radiologist knowledge in a concept bottleneck model for classification.
Multi-reader assessment of model performance and clinical utility with 1,695 lesions.
CBM achieved an AUC of 0.92 (95% CI 0.90–0.93) on the test set, comparable to black-box model (AUC 0.93).
Radiologist diagnostic accuracy improved with CBM assistance, with accuracy rising from 0.71 to 0.91 (all P < 0.05).
Inter-reader agreement increased significantly for concept recognition and BI-RADS category (Gwet’s AC1: 0.27-1.00 to 0.46-1.00).

Abstract

Accurately differentiating early-stage breast cancer from benign lesions on MRI is essential to reduce unnecessary biopsies. However, the limited interpretability of current deep learning models hinders their clinical trustworthiness and adoption. This study aimed to develop a clinically interpretable concept bottleneck model (CBM) that integrates radiologist-specific knowledge and automatically generates structured reports, thereby improving diagnostic accuracy and consistency in breast MRI interpretation. Preoperative breast MR images and radiological reports were retrospectively collected from five institutions (January 2016–July 2025) and allocated to internal, external and multi-reader cohorts. Lesion-related descriptors from free-text MRI reports were standardized into BI-RADS-compliant concepts. These concepts, alongside multiparametric MR sequences, were input into the CBM for classification and structured reporting of the lesions annotated by radiologists using bounding boxes. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and compared against a black-box deep learning model. The accuracy of CBM-generated concepts was evaluated. A two-phase multi-reader study was further conducted to assess clinical utility. A total of 1,695 pathology-confirmed breast lesions (857 malignant and 838 benign) from 1,634 patients (median age 46 years, IQR 39–53) were included. The CBM achieved an AUC of 0.92 (95%CI 0.90–0.93) on the test set, comparable to the black-box model (AUC: 0.93, 95%CI 0.92–0.94). Concept accuracy ranged from 0.64 to 1.00. In the multi-reader study, the CBM matched the diagnostic accuracy of one radiologist and exceeded that of seven others (all P < 0.05). With CBM assistance, radiologists correctly downgraded 22.1% of lesions to benign. Diagnostic accuracy improved for three radiologists (from 0.71 to 0.72 to 0.82–0.91, all P < 0.05), and inter-reader agreement increased for both concept recognition and BI‑RADS category (Gwet’s AC1: 0.27-1.00 to 0.46-1.00). The CBM provides a versatile framework for classifying early breast cancer and benign lesions. By employing an image-concept alignment strategy, it enhances intrinsic interpretability and offers radiologists clinically relevant, intelligible decision support that serves both diagnostic and educational needs. Moreover, this retrospective study demonstrates its potential to reduce unnecessary biopsies for benign breast lesions and to improve reporting consistency in breast MRI.

Mark Helpful

Bookmark

Relay

View Full Paper