November 30, 2025Open Access

A Multi-Scale Structure with Improved Reverse Attention for Polyp Segmentation

Key Points

Optimal mDice and mIoU achieved in polyp segmentation tasks, outperforming existing models.
Model demonstrates 0.2% higher mDice than MEGANet on Kvasir-SEG dataset across multiple benchmarks.
Utilizing a multi-scale structure and improved reverse attention, the architecture enhances feature extraction.
Results highlight the need for advanced segmentation methods in colorectal cancer early detection.

Abstract

Colorectal cancer (CRC) is the second most common global malignancy with high mortality, and timely early polyp detection is critical to halt its progression. Yet, polyp image segmentation—an essential tool—faces challenges: blurred edges, small sizes, and artifacts from intestinal folds, bubbles, and mucus. To address these, we proposed a novel segmentation model with multi-scale feature extraction. Its encoder uses Multiscale Attention-based Pyramid Vision Transformer v2 (PVTv2) for hierarchical features (lower-stage modules expand receptive field), while the decoder adopts a Parallel Multi-level Aggregation structure, plus multi-branch and improved reverse attention modules. Ablation experiments validated key modules. Compared to nine state-of-the-art networks across five benchmarks, the model showed superiority: optimal mDice/mIoU on polyp datasets, 0.2% higher mDice than MEGANet on Kvasir-SEG, and outperformance over UHA-Net and CSCA-U-Net on CVC-ClinicDB.

Read Full Paperexternally

Ask AI

Mark Helpful

Bookmark

Relay

View Full Paper