The growing need for accurate and prompt diagnosis of blood cancers, especially leukemia, has boosted the progression of deep learning-based classification systems. Conventional microscopic examination, although accurate, is a time-consuming process that requires expertise. Therefore, automated methods are being explored for this purpose. In this study, we propose a novel model, the Cross-Attentive Dual-Branch CNN (CADBCNN), for the classification of leukaemia blood cells. The proposed model combines two feature branches using the ResNet18 and ResNet34 architectures with a cross-attention fusion mechanism. The proposed CADBCNN model achieved 98.36% accuracy, 98.41% precision, 98.36% recall, and 98.38% F1-score using stratified splitting, augmentation, and weighted optimisation. The proposed model outperformed existing architectures, including ResNet34, ResNet50, EfficientNet-B0, and a dual-branch model. Grad-CAM visualisation demonstrated that our model focuses on important regions, such as the nuclei and cytoplasm. The t-SNE plots show that our model achieved clear class separation. Therefore, we can conclude that our proposed model has achieved high accuracy and has a high potential for being used as a tool for automated leukemia blood cell classification. • The proposed CADBCNN (Cross-Attentive Dual-Branch CNN) achieved a maximum validation accuracy of 98.90% and test accuracy of 98.36% across four leukemia stages. • Dual-branch architecture (ResNet18 + ResNet34) significantly improves feature representation and classification performance compared to single-branch networks. • The cross-attention mechanism enhances feature synergy between branches, leading to better discrimination of morphologically similar subtypes. • Data preprocessing and augmentation improve generalization, reduce overfitting, and enable robust performance across all classes. • Explainable AI (Grad-CAM) and t-SNE visualizations confirm that the model focuses on clinically relevant regions (WBC nuclei and cytoplasm) and learns highly discriminative latent representations. • Ablation studies indicate that removing any key component (dual-branch, cross-attention, or augmentation) reduces accuracy and feature separability, validating the necessity of each design choice.
Jha et al. (Wed,) studied this question.