Accurate and real-time metal surface defect detection under complex backgrounds and large appearance variations remains a critical challenge in intelligent manufacturing. Existing lightweight detectors often suffer from suboptimal performance due to uniformly applied feature refinement strategies across different network depths, which limits their ability to balance fine-grained representation and computational efficiency. To address this issue, we propose a hierarchical depth-aware refinement framework, termed HDR-YOLO, which explicitly aligns feature enhancement mechanisms with the distinct roles of shallow and deep representations. Specifically, a Query-Focused Convolution (QFC) block is introduced in shallow layers to enhance high-resolution texture and edge information, while a Query-Based Fusion (QBF) block is employed in deeper layers to improve global semantic modeling through adaptive feature interaction. The proposed design enables more effective detection of small-scale defects and irregular fine-grained patterns. Extensive experiments on the NEU-DET and GC10-DET datasets demonstrate that HDR-YOLO improves mAP@0.5 by 3.92% and 7.67%, respectively, over the baseline, while maintaining competitive inference efficiency. These results validate that depth-aware refinement is an effective strategy for enhancing lightweight defect detection under real-time industrial constraints.
Qin et al. (Sat,) studied this question.