Gastrointestinal polyps are potential precursors to colorectal cancer, making accurate detection via endoscopy a key component of effective screening and surveillance strategies. However, automated detection remains challenging due to significant variations in polyp morphology, scale imbalance, and imaging noise, which often lead to high miss rates in existing approaches. To address these issues, we propose an optimized real-time detection framework based on the RT-DETR-r18 architecture. Our method integrates three strategic enhancements to improve feature representation and attention mechanisms. First, we introduce the DWRC3-DRB module, which utilizes residual and dense connections to robustly fuse shallow and deep features. Second, to mitigate background interference, we design the EAAAIFI module; this component enables global context modeling with linear complexity, effectively focusing on informative regions. Finally, we incorporate the ELA-HSFPN module, combining efficient local attention with a hierarchical feature pyramid to facilitate adaptive multi-scale feature integration. Systematic experiments on a hybrid endoscopic dataset demonstrate that our approach significantly outperforms the baseline and state-of-the-art detectors. Specifically, compared to RT-DETR-r18, our method increases precision from 90.7% to 94.8%, recall from 84.0% to 89.9%, and mAP@0.5 from 92.2% to 94.2%. Crucially, it achieves this accuracy while maintaining a processing speed of 188.6 FPS, fully meeting the low-latency benchmarks required for real-time video processing.
Du et al. (Tue,) studied this question.