What question did this study set out to answer?

This research aims to enhance medical image segmentation by merging convolutional neural networks and transformers.

May 28, 2026Open Access

Convolutional branch aggregate transformer for medical image segmentation

Key Points

This research aims to enhance medical image segmentation by merging convolutional neural networks and transformers.
Developed CAFormer, a parallel dual-branch network with a Transformer branch for global features and a CNN branch with dynamic convolutional and channel attention modules.
Conducted extensive experiments on four public datasets: Kvasir-SEG, CVC-ClinicDB, GlaS, and ISIC 2017.
Implemented a Prediction Head for Branch aggregation module to fuse features from both branches.
Achieved Dice scores of 0.9394, 0.9481, 0.9381, and 0.9310 on the respective datasets.
Significantly outperformed state-of-the-art methods in segmentation capability.

Abstract

Convolutional Neural Networks (CNNs) and Transformers have become the two dominant architectures in the field of medical image segmentation. However, CNNs are limited in modeling long-range dependencies due to the locality of convolution operations, while Transformers may overlook fine-grained local details. To combine the advantages of both while compensating for their weaknesses, this article proposes a parallel dual-branch network named CAFormer, designed to simultaneously capture local details and global contextual information. In this architecture, the Transformer branch (BTB) is responsible for extracting global semantic features, whereas the CNN branch (BCB) incorporates a Full Dynamic Convolutional Kernel (DCK) module and a Full-Scale Channel Attention (FSC) module to enhance adaptability and representation flexibility for diverse features. Furthermore, a Prediction Head for Branch aggregation (BPH) module is introduced to effectively fuse the complementary features from both branches. Extensive experiments conducted on four public datasets—Kvasir-SEG, CVC-ClinicDB, GlaS, and ISIC 2017—demonstrate that CAFormer achieves Dice scores of 0.9394, 0.9481, 0.9381, and 0.9310, respectively. These results significantly outperform existing state-of-the-art methods, validating the superior segmentation capability of the proposed model.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper