To address the insufficient capability of YOLO-series models in representing structural information for foreign objects with diverse scales and morphologies, an improved algorithm named SSFE-YOLO is proposed. First, the Space-to-Depth Convolution (SPDConv) is adopted into the backbone network to preserve edge and texture details in shallow features during downsampling, thereby maintaining the integrity of critical target structures at the feature generation stage. Second, an adaptive receptive field enhancement module (ARFE) is designed by introducing parallel feature branches with varying receptive fields. This module performs adaptive fusion to bolster the structural perception of the network towards polymorphic foreign objects. Furthermore, a distribution-feature stable compensation module (DFSC) is designed to suppress feature distribution shifts caused by illumination variations and noise interference through structural consistency enhancement and stable distribution constraints, which significantly improves the stability of feature representation in complex environments. Finally, a dual-dimension optimized loss function (D2-OL) is constructed to achieve differentiated supervision for samples of varying quality and balanced optimization for multi-scale target detection by modulating the supervisory weights of feature layers and filtering effective training samples. Experimental results on a self-built mine conveyor belt dataset demonstrate that the proposed method achieves an mAP@0.5 of 90.5% and an mAP@0.5:0.95 of 59.1%, consistently outperforming mainstream models such as YOLOv8, YOLOv11, and YOLOv13. Simulation results indicate that the proposed approach effectively enhances the detection accuracy and robustness of foreign objects in mining environments, showcasing substantial potential for engineering applications.
Tian et al. (Fri,) studied this question.