Timely and accurate classification of breast lesions is needed on mammograms, as it will enhance clinical decisions and reduce unnecessary biopsies. The study proposes a Multi-Scale Hybrid ResNet–Transformer with Distance-Aware Learning for interpretable BI-RADS mammographic classification. The model integrates the spatial representation strength of ResNet-50 with the contextual modeling capability of lightweight multi-head self-attention layers, forming a unified hybrid architecture. Distance-Aware Learning loss is introduced to account for the ordinal nature of BI-RADS categories, penalizing predictions based on their proximity to the true class. The stage of preprocessing includes CLAHE to enhance mammographic contrast, followed by balanced oversampling and controlled augmentations to address data imbalance. Further, the model is trained and evaluated, which indicates strong generalization across validation and test sets. The hybrid model achieved a test accuracy of 0.921, with a mean AUC of 0.987 on the test set. The model performs a per-class discriminability, with F1-scores above 0.92 for clinically critical BI-RADS 4–5 categories. Moreover, the feature-space visualization and Grad-CAM based visual explanations confirm that the model focuses on clinically relevant lesion regions, providing interpretable outputs aligned with radiologist’s reasoning. The proposed framework will provide a clinically meaningful and efficient approach to automated BI-RADS classification, and may support future computer-aided diagnostic workflows.
Singh et al. (Fri,) studied this question.