Aiming at the problems of low target detection accuracy and poor robustness caused by factors such as uneven illumination, coal dust interference, and diverse coal-gangue morphologies in the complex environment of underground coal mines, a real-time coal-gangue image detection model based on EMAM-YOLO is proposed. This model uses YOLOv12n as the baseline framework and integrates multidimensional optimization modules to enhance comprehensive performance. First, the lightweight EfficientNetV1 network is adopted to reconstruct the backbone feature extraction structure, effectively reducing the number of model parameters while ensuring high-precision feature representation. Second, a Multiscale Attention Feature Pyramid Network (MAFPN) is introduced to strengthen the cross-scale interaction between shallow detail information and deep semantic information, improving the detection capability for coal-gangue targets of various sizes. Then, an Adaptive Spatial Feature Fusion detection head (DetectASFF) is designed to optimize the multiscale feature fusion strategy by learning dynamic weights, enhancing the model’s localization accuracy for occluded and deformed targets. Finally, a multiscale channel attention (MCA) mechanism is incorporated to guide the network to focus on key feature channels and suppress redundant information, further improving feature discrimination capability. The primary technical contribution of this work is 2-fold: (1) a novel adaptive spatial feature fusion detection head (DetectASFF) that learns dynamic spatial weights, representing a structural innovation within the head; (2) the specific combination and synergistic interaction of EfficientNetV1, MAFPN, DetectASFF, and MCA, each addressing a distinct drawback of YOLOv12n for coal-gangue detection. Experimental results on a self-built coal-gangue data set show that the EMAM-YOLO model achieves a mean average precision (mAP50–95) of 81. 17%, which is 5. 04 percentage points higher than the baseline YOLOv12n, with only 2. 59 M parameters and a detection speed of 69. 89 FPS, achieving a good balance between detection accuracy and real-time performance. Compared with mainstream algorithms such as Faster R–CNN, SSD, YOLOv8n, and YOLOv10n, the proposed model exhibits stronger robustness and better detection performance under simulated complex working conditions, and can provide effective technical support for intelligent coal washing in mines.
Jia et al. (Thu,) studied this question.