What question did this study set out to answer?

This research aims to create a predictive model for nursery piglet mortality using historical production data.

May 9, 2026

Data-driven Forecasting of Nursery Piglet Mortality in Commercial Swine Systems

Key Points

This research aims to create a predictive model for nursery piglet mortality using historical production data.
Analyzed 306 variables from multiple swine farms to gather production metrics.
Utilized machine learning algorithms, Extreme Gradient Boosting and Extra Trees, and blended their outputs.
Evaluated model performance on a hold-out test set using correlation, accuracy, and other metrics.
The ensemble model achieved a Pearson correlation of 0.672 and accuracy of 0.823 on the independent test set.
Sensitivity was 0.806 and negative predictive value reached 0.884, indicating effective risk identification.
Isotonic regression calibration improved prediction reliability without compromising correlation strength.

Abstract

Abstract Introduction Nursery piglet mortality remains a persistent challenge in commercial swine production, with major implications for animal welfare and economic performance. Early identification of at-risk groups enables proactive action to improve herd health and reduce losses. Forecasting mortality risk before weaning allows targeted interventions such as reallocating labor, reinforcing biosecurity, or adjusting feed and treatment programs. This study aimed to develop and evaluate a data-driven predictive framework to estimate 60-day nursery mortality risk using routinely collected production data from U.S. commercial swine farms. Materials and Methods The dataset included 306 variables from multiple sow farms and nursery sites, covering reproductive, health, growth, and management metrics. After splitting the data into training and testing subsets, numeric variables were transformed using the Yeo–Johnson method to reduce the effect of extreme values and standardized to a common scale. Categorical variables, such as farm ID and health status, were converted into binary indicators through One-Hot Encoding. Two machine learning algorithms, Extreme Gradient Boosting (XGB) and Extra Trees (ET), were tuned via cross-validated grid searches, optimizing for Pearson correlation (r) to maximize agreement with observed mortality rates. After training the individual models, their predictions were combined through model blending, using a validation split to determine the optimal weighting between XGB and ET without accessing the test data. Isotonic Regression calibration was applied to improve probability reliability. Model performance was evaluated on an independent test set (30% hold-out portion of the dataset reserved for final evaluation), using correlation, accuracy, positive predictive value (PPV), recall, F1 score, and negative predictive value (NPV), with a fixed decision threshold of 0.20. Results and Discussion Individually, the XGB and ET models achieved Pearson correlations of 0.646 and 0.552, with corresponding F1 scores of 0.826 for both models. The XGB model demonstrated the best overall balance between sensitivity (0.806) and PPV (0.847), reaching an accuracy of 0.755 and NPV of 0.556. In contrast, the ET model showed a more aggressive positive bias, yielding higher sensitivity (0.961) but lower specificity and NPV (0.333). The blended ensemble outperformed both individual models, achieving a Pearson correlation of 0.672, accuracy of 0.823, PPV of 0.702, recall of 0.750, F1 score of 0.725, and negative predictive value (NPV) of 0.884 on the independent test set. These results demonstrate a strong overall balance between sensitivity and specificity, indicating that the calibrated ensemble effectively distinguishes high-risk from low-risk nursery groups. Isotonic calibration improved probability reliability without reducing correlation strength, producing more trustworthy mortality estimates. These findings highlight the value of machine-learning forecasting in swine production. By identifying high-risk nursery groups early, farms can prioritize monitoring, optimize resources, and apply targeted interventions, ultimately reducing losses and enhancing welfare.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mateus de Castro Duarte Cardoso

Iowa State University

Thinh Tien

Iowa State University

Jackson C Sterle

Iowa State University

Journals

Journal of Animal Science

Actions

Institutions

Iowa State University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Data-driven Forecasting of Nursery Piglet Mortality in Commercial Swine Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider