What question did this study set out to answer?

The aim is to improve keypoint detection for Binglangjiang buffalo, focusing on challenges posed by low-texture imaging and limited training samples.

June 3, 2026Open Access

DINOv2-Driven Monocular Body Measurement Keypoint Detection for Low-Texture Endangered Binglangjiang Buffalo

Key Points

The aim is to improve keypoint detection for Binglangjiang buffalo, focusing on challenges posed by low-texture imaging and limited training samples.
Constructed a benchmark dataset of 10,834 annotated images for 424 buffalo individuals.
Adapted DINOv2 with a top-down heatmap regression head for keypoint detection.
Validated body measurements on a sample of 20 buffalo for performance assessment.
DINOv2-Base achieved 96.51% mAP, surpassing YOLOv8m by 5.6 percentage points.
DINOv2 showed only 0.28% mAP fluctuation on low-texture regions, outperforming standard ViT which had 0.82%.
Body measurement validation yielded MAPE values of 1.76–5.69% across five measurements.

Abstract

The Binglangjiang buffalo, the only indigenous river-type buffalo in China, poses significant challenges for automated keypoint detection due to its uniformly black, low-texture coat, poor foreground–background contrast, and scarcity of annotated training samples. To address these challenges, this study constructs a benchmark dataset of 10,834 lateral-view images covering 424 individuals, annotated with 10 body measurement keypoints following standardized buffalo measurement protocols. A keypoint detection pipeline is developed by adapting DINOv2 with a top-down heatmap regression head under a single-view imaging setup, reducing hardware complexity for practical farm deployment. Benchmarking against YOLOv8 series and a standard ViT baseline shows that DINOv2-Base achieves 96.51% mAP, surpassing YOLOv8m by 5.6 percentage points. Compared to standard ViT, DINOv2 demonstrates more stable localization across keypoints under model scaling. Specifically, on the scapular tip (P8), a particularly low-texture region, DINOv2 exhibits only 0.28% mAP fluctuation versus 0.82% for standard ViT, indicating greater robustness to limited training data and low-contrast imaging. Body measurement validation on 20 individuals yields MAPE values of 1.76–5.69% across five measurements, confirming reliable non-contact measurement performance. The dataset and pipeline provide practical support for precision livestock management of endangered breeds.

DINOv2-Driven Monocular Body Measurement Keypoint Detection for Low-Texture Endangered Binglangjiang Buffalo

Key Points

Abstract

Cite This Study