What question did this study set out to answer?

This research aims to improve the accuracy of medical image segmentation using a novel architecture, CA 2 PNet.

June 9, 2026Open Access

CA2PNet: a context-aware multi-scale architecture with adaptive attention and progressive dilated convolutions for biomedical image segmentation

Key Points

This research aims to improve the accuracy of medical image segmentation using a novel architecture, CA 2 PNet.
Developed a Context Aware Adaptive Progressive Network (CA 2 PNet) inspired by DeepLabV3+ and FusionNet.
Incorporated spatial attention, Global Max Pooling, enhanced Spatial Pyramid Pooling, and progressive dilated convolutions.
Evaluated performance on Kvasir-SEG and BUSI datasets.
Achieved mean intersection of union of 85.15% for Kvasir-SEG and 82.78% for BUSI datasets, surpassing state-of-the-art models.
Demonstrated robustness through statistical testing of the model.
Showed enhanced boundary adherence and scale-invariant segmentation.

Abstract

Background and objectives Accurate medical image segmentation remains a challenging task in computer-aided diagnosis because of the intricacies and the variability in the biomedical data in terms of the anatomical complexity, inter-patient diversity, class imbalance, and irregular morphological patterns. Methods In the present work, a Context Aware Adaptive Progressive Network (CA 2 PNet) is proposed. The foundational architecture of CA 2 PNet is inspired from DeepLabV3+ and FusionNet and introduces four key modifications by the incorporation of (a) the spatial attention module (SAM) to emphasize discriminative spatial regions, (b) Global Max Pooling to strengthen contextual representation and suppress background noise, (c) an enhanced Spatial Pyramid Pooling for robust multi-scale feature extraction, and (d) progressive dilated convolutions to expand the receptive field while preserving fine structural details. The incorporation of these modules enables the simultaneous refinement of the extraction of local features as well as preserving the global context. Results CA 2 PNet offered a mean intersection of union for 85.15 and 82.78% for the Kvasir-SEG and BUSI datasets, respectively, surpassing state-of-the-art models. The statistical tests also validate the robustness of the proposed model. Conclusions The proposed work demonstrates that the embedding of multi-scale features at each encoder stage along with decoupled decoding aids in overcoming the limitation of resolution loss in classical segmentation architectures; thus, resulting in superior boundary adherence and scale-invariant segmentation.

Bookmark

View Full Paper