Colorectal cancer, as a malignant tumor with a high incidence rate worldwide, relies on the precise segmentation of polyps during colonoscopy for its early diagnosis. However, clinical colonoscopy images often face challenges such as low contrast, blurred boundaries, large differences in morphological scale, and interference from intestinal wall folds, resulting in insufficient accuracy of traditional segmentation methods. To address the above problems, this paper proposes a PCA-TransUNet model based on the parallel cross-attention mechanism, taking TransUNet as the baseline framework and introducing the parallel cross-attention module in its skip connections. This module consists of two branches: channel cross-attention and spatial cross-attention. The channel branch enhances the semantic feature discrimination through cross-scale channel interaction, while the spatial branch optimizes the boundary positioning accuracy by using long-range dependency relationships. The outputs of the two are adaptively integrated through a dynamic weighted fusion mechanism to form multi-scale enhanced features, significantly improving the segmentation robustness in complex scenarios. Experiments on the CVC-ClinicDB and Kvasir-SEG datasets show that the model proposed in this paper outperforms the comparison models in multiple indicators. PCA-TransUNet achieved mIoU of 92.89% and Dice of 95.79% on CVC-ClinicDB, and 90.81% and 95.25%, respectively, on Kvasir-SEG, providing reliable technical support for clinical auxiliary diagnosis.
Chen et al. (Wed,) studied this question.