Accurate anatomical landmark localization in clinical images requires millimeter-level spatial precision, yet whether increasing model scale improves such precision in structured medical imaging tasks remains unclear. Five YOLO26 pose-estimation variants (N, S, M, L, and X) were evaluated on 3679 RGB distal-arm images from 262 participants under a standardized overhead imaging protocol, with five anatomical landmarks annotated across the proximal forearm, mid-forearm, and hand. Localization error was quantified in millimeters using ArUco-marker-based pixel-to-millimeter calibration; all models were initialized from COCO-pretrained weights, fine-tuned under identical conditions, and assessed using COCO-style detection metrics and physically grounded localization error. Detection performance saturated across all scales (mAP@0.5 = 99.5%), while localization performance differed substantially; YOLO26N achieved the lowest mean error (2.76 ± 0.96 mm) and the highest proportion of predictions within 4 mm (88.0%), whereas YOLO26X produced the highest mean error (4.08 ± 2.59 mm) despite a 26.9× higher computational cost. Landmark-wise analysis revealed a consistent proximal-to-distal error gradient, with the largest degradation at anatomically ambiguous proximal landmarks in larger models. These findings suggest that increasing model capacity does not improve clinically meaningful localization precision in structured distal-arm imaging, and lightweight models may offer the most favorable accuracy-efficiency trade-off in resource-constrained clinical settings.
Padmanabha et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: