What question did this study set out to answer?

The aim is to improve colorectal polyp segmentation in colonoscopy images for better diagnosis and prevention of colorectal cancer.

March 5, 2026Open Access

DEGF-Net: Dual-Encoder Global–Local Joint Feature Aggregation Network for colorectal polyp segmentation

Key Points

The aim is to improve colorectal polyp segmentation in colonoscopy images for better diagnosis and prevention of colorectal cancer.
Developed DEGF-Net with a dual-encoder architecture for feature extraction.
Implemented a Global Joint Feature Fusion Module for aligning features.
Used a Upper-Lower Level Feature Fusion Module for better detail refinement.
Employed multi-output hybrid loss to enhance accuracy and convergence.
Achieved mean Dice scores of 0.933 on Kvasir-SEG and 0.958 on CVC-ClinicDB.
Surpassed recent CNN and Transformer-based segmentation methods.
Demonstrated effective cross-dataset generalization in various imaging domains.

Abstract

Accurate segmentation of colorectal polyps in colonoscopy images is crucial for early prevention and computer-aided diagnosis of colorectal cancer, yet large variations in polyp appearance, low polyp-mucosa contrast, and device-related imaging discrepancies still hinder robust performance, especially for small and flat lesions and cross-dataset generalization. To address these challenges, we propose a Dual-Encoder Global–Local Joint Feature Aggregation Network (DEGF-Net) that enhances feature fusion and improves generalization. DEGF-Net adopts a dual-encoder architecture that separately models long-range global context and fine-grained local textures. A Global Joint Feature Fusion Module (GFFM) employs global attention to align and aggregate high-level features from both branches into a unified representation, while an Upper-Lower Level Feature Fusion Module (UL-FM) performs residual multi-scale cross-layer fusion in the decoder to narrow the semantic gap between high-level semantics and low-level details and refine polyp boundaries. In addition, a multi-output hybrid loss is applied to the final and intermediate predictions to leverage deep supervision, accelerate convergence, and improve robustness. Experiments on two benchmark colonoscopy datasets, Kvasir-SEG and CVC-ClinicDB, show that under a unified setting, DEGF-Net achieves mean Dice scores of 0.933 and 0.958, respectively, surpassing recent CNN-based, Transformer-based, and hybrid architectures and exhibiting strong cross-dataset generalization. These results indicate that DEGF-Net can substantially improve automatic polyp segmentation and provide a promising technical basis for computer-aided colorectal cancer screening. • A novel CNN-Transformer dual-encoder framework is proposed for colorectal polyp segmentation. • A global joint feature fusion module explicitly aligns high-level CNN and Transformer semantics. • A residual cross-scale fusion strategy bridges the semantic gap between global context and fine details. • The proposed method achieves Dice scores of 0.933 and 0.958 on Kvasir-SEG and CVC-ClinicDB. • Strong cross-dataset and cross-domain generalization is demonstrated on retinal and cell datasets.

DEGF-Net: Dual-Encoder Global–Local Joint Feature Aggregation Network for colorectal polyp segmentation

Key Points

Abstract

Cite This Study