The advancement of Precision Livestock Farming (PLF) depends on high-quality data, yet a systematic understanding of the open data landscape remains fragmented. This review adopts a bidirectional perspective, evaluating both open datasets and the researchers who cited them, with a focus on research objectives and practical applications. Through bidirectional searches of Scopus/Web of Science and digital repositories, 315 open datasets were identified between 2010 and 2025, spanning dairy cows, beef cattle, pigs, poultry, and other species. The majority were released within the last five years, signalling a remarkable data explosion. Peer-reviewed literature remains the primary dissemination engine 63% (n=199), while standalone repositories contribute 37% (n=116), reflecting a shift toward data-first scientific contributions. Species distribution is skewed toward cattle (n=131), followed by swine (n=59) and poultry (n=43). Non-AI computer vision relies on deterministic algorithms exploiting the physical and spectral properties of animal images. Among AI approaches, object detection dominates livestock monitoring, with YOLO architectures and Region-based Convolutional Neural Networks leading the field. Generative AI — particularly Foundation Models (FM) and Generative Adversarial Networks (GANS) — has mitigated the scarcity of labelled data, superseding manual annotation through automated frameworks such as the Accelerated Data Engine. These resources are evolving into the backbone of Large Language Models (LLMs) and visual-language frameworks, enabling herd reasoning and predictive diagnostics, marking the transition from reactive to proactive, generative monitoring. Integrating datasets across biometric identification, health, and behavioural clusters advance food security and animal welfare. Nevertheless, gaps in dataset diversity and standardisation hinder reproducibility, demanding an ethical shift toward data sustainability and computational efficiency.
Ruchay et al. (Fri,) studied this question.