Colorectal cancer (CRC) is the second most common global malignancy with high mortality, and timely early polyp detection is critical to halt its progression. Yet, polyp image segmentation—an essential tool—faces challenges: blurred edges, small sizes, and artifacts from intestinal folds, bubbles, and mucus. To address these, we proposed a novel segmentation model with multi-scale feature extraction. Its encoder uses Multiscale Attention-based Pyramid Vision Transformer v2 (PVTv2) for hierarchical features (lower-stage modules expand receptive field), while the decoder adopts a Parallel Multi-level Aggregation structure, plus multi-branch and improved reverse attention modules. Ablation experiments validated key modules. Compared to nine state-of-the-art networks across five benchmarks, the model showed superiority: optimal mDice/mIoU on polyp datasets, 0.2% higher mDice than MEGANet on Kvasir-SEG, and outperformance over UHA-Net and CSCA-U-Net on CVC-ClinicDB.
Yan et al. (Wed,) studied this question.