We present a scene-specific MobileNetV3-DETR for chip-leg defect inspection, where targets are tiny, dense, and arranged on a regular lattice. Instead of stacking generic tricks, we formalize three complementary mechanisms: (A1) a task-aligned FPN pruning criterion that selects the minimal pyramid levels P3–P5 to match the empirical defect-size distribution; (A2) a lattice-aware relative positional encoding that biases attention toward physically plausible row/column offsets; and (A3) a defect-prior adaptive Top-K sparse attention that allocates the decoder’s attention budget by local response quantiles with a device-aware cap Kmax. In training, a defect-aware assignment (D-AA) re-weights the Hungarian classification term by smoothed class priors, improving recall of rare, safety-critical defects without changing the inference graph. Under a unified embedded protocol on Jetson Nano (TensorRT FP16, 512 × 512, batch = 1), the model uses ~ 7. 2 M parameters and ~ 29. 9 GFLOPs, reaches 89. 5% mAP@50 and an average of 55. 25% mAP@0. 5: 0. 95, and delivers ≈ 60 ms per image. Ablations demonstrate independent gains from A1–A3, while confusion-matrix analysis confirms reduced errors on the hardest pairs (e. g. , bentₗeg vs. damagedₗeg; dirty vs. scratches), indicating an improved accuracy–latency balance for inline AOI deployment.
Zhang et al. (Fri,) studied this question.