December 5, 2025Open Access

A Comparative Analysis of Machine Learning Classifiers for Modeling the Number of Liveborn Piglets

Key Points

Stochastic Gradient Descent achieved the highest weighted F1-Score of 0.45 for predicting liveborn piglets.
Data were collected from two distinct farm settings with over 12,000 sow records for accurate model training.
Analysis included assessing multiple classifiers' performance with a focus on predictive accuracy for sow productivity.
Findings suggest that enhancing training datasets and including key traits can improve classification efficiency.

Abstract

Abstract The profitability of pig farms is largely dependent on the productivity of their sow herds, often measured as the number of piglets weaned per sow per year, which is closely linked to the number of liveborn piglets (NLB) per sow per litter. To improve farming efficiency, underproducing sows are often culled and replaced to reduce and compensate for their maintenance costs. However, wrongly culling highly productive sows incurs costs associated with replacement and missed opportunities. Therefore, the ability to distinguish between sows with high and low productivity is very valuable. This study evaluated the predictive performance of six traditional and three ensemble machine learning classification models to predict whether NLB in the subsequent parity is “Low” (NLB 13), “Medium” (13 ≤ NLB ≤ 16) or “High” (NLB 16). This evaluation was conducted using data collected during the current parity from two distinct farm settings: the CDPQ Dataset (415 sows, 468 parity records, 1 research farm) from a research farm, and the Hypor Dataset (11,633 sows, 27,547 records, 12 commercial farms). Six input production measurements were common across both datasets: parity, gestation length, lactation length, current parity NLB, and the number of stillborn, mummified and weaned piglets. The CDPQ Dataset included 6 additional input production measurements: body weight (BW) and backfat thickness (BFT) measured at the time of breeding, farrowing and weaning. Classifiers used these input variables to generate predictions, and their performances were assessed using weighted F1-Score. The best-performing classifier for the CDPQ and the Hypor Datasets was Stochastic Gradient Descent (SGD), which achieved the highest weighted F1-Score of 0.37 and 0.45, respectively. In addition, by removing the six BW and BFT variables from the CDPQ Dataset (reduced CDPQ Dataset), the SGD classifier only attained a weighted F1-Score of 0.43, likely caused by information redundancy and limited dataset size. Despite improved performance, variable importance analysis for the CDPQ dataset identified BW and BFT at weaning as the key predictors, suggesting that including these traits could similarly enhance models trained on the Hypor dataset. Overall, the study demonstrated that machine learning classifiers show promise for forecasting sow productivity, though further research with more extensive and higher-quality datasets is required before broad industry adoption.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ji Yang

University of Science and Technology of China

Mohsen Jafarikia

Dalhousie University

Patrick Gagnon

Centre de Développement du Porc du Québec

Journals

Journal of Animal Science

Actions

Institutions

Dalhousie University

University of Guelph

Centre de Développement du Porc du Québec

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Comparative Analysis of Machine Learning Classifiers for Modeling the Number of Liveborn Piglets

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study