Deep learning-based sheep face recognition technology significantly enhances the automation of individual sheep identification, providing critical technical support for smart livestock farming and precision agriculture. However, in real farming environments, factors such as complex backgrounds, illumination variations, and the high visual similarity of sheep faces severely constrain the comprehensive performance of recognition systems regarding accuracy and real-time capability. To address these challenges, we propose a cascaded framework comprising the WRT-DETR model for detection and LG-MobileViT for identification. WRT-DETR integrates multi-scale wavelet residual modeling and adaptive feature interaction into the RT-DETR architecture to effectively handle complex backgrounds. Subsequently, LG-MobileViT utilizes local–global collaborative modeling to distinguish fine-grained features while maintaining a lightweight footprint suitable for edge devices. Experiments conducted on a dataset of 400 individuals and 20,000 images demonstrate that WRT-DETR achieves 92.5% mAP50 in detection tasks. Furthermore, LG-MobileViT attains 98.97% recognition accuracy with a parameter size of only 4.57 MB. On edge computing platforms, the integrated system reaches an inference speed approaching 100 FPS. These results confirm that the proposed framework offers an efficient, reliable technical solution for non-contact, precise sheep identification in practical precision agriculture scenarios.
Zhang et al. (Sun,) studied this question.