What question did this study set out to answer?

The research aims to improve lung cancer detection and classification using a hybrid CNN-Transformer model with dynamic attention mechanisms.

April 28, 2026Open Access

LungDxFormer: a transformer-CNN hybrid model with dynamic spatial attention for accurate lung cancer detection and classification

Key Points

The research aims to improve lung cancer detection and classification using a hybrid CNN-Transformer model with dynamic attention mechanisms.
Developed LungDxFormer, a hybrid CNN-Transformer model with dynamic spatial attention.
Utilized patient-wise cross-validation on the public LIDC-IDRI dataset.
Classified lung nodules into three categories: benign, indeterminate, and malignant.
Achieved 97.35% overall accuracy with high precision, recall, and AUC across all classes.
Effectively classified the challenging indeterminate nodules, improving diagnostic reliability.
Used Grad-CAM visualizations to present interpretable model decision-making consistent with clinical expectations.

Abstract

Lung cancer screening (LCS) from computed tomography (CT) is notoriously difficult due to nodules that only have borderline visual patterns that overlap across multiple diagnostic categories. Most existing CAD systems rely solely on CNNs or standalone transformers, with limited global–local feature synergy, interpretability, and multi-class stratification within a distributed framework. To circumvent such limitations, we present an approach, LungDxFormer, a hybrid CNN–Transformer model with a Dynamic Spatial Attention (DSA) mechanism for clinically relevant region attention and interpretable decision presentation. The framework directly classifies lung nodules into three classes (benign, indeterminate, and malignant). Using patient-wise cross-validation on the public LIDC-IDRI dataset, our method achieves 97.35% overall accuracy with high precision, recall, and AUC across all three classes, including the clinically challenging indeterminate class. We can further explain the model using Grad-CAM visualisations that identify diagnostically relevant regions, consistent with clinicians’ expectations. These results support lung nodule classification using CT scans with LungDxFormer as a novel, interpretable, and robust approach that could provide accurate, interpretable CT-based classification.

Bookmark

View Full Paper

Bookmark

View Full Paper

LungDxFormer: a transformer-CNN hybrid model with dynamic spatial attention for accurate lung cancer detection and classification

Key Points

Abstract

Cite This Study