Abstract Introduction Nursery piglet mortality remains a persistent challenge in commercial swine production, with major implications for animal welfare and economic performance. Early identification of at-risk groups enables proactive action to improve herd health and reduce losses. Forecasting mortality risk before weaning allows targeted interventions such as reallocating labor, reinforcing biosecurity, or adjusting feed and treatment programs. This study aimed to develop and evaluate a data-driven predictive framework to estimate 60-day nursery mortality risk using routinely collected production data from U.S. commercial swine farms. Materials and Methods The dataset included 306 variables from multiple sow farms and nursery sites, covering reproductive, health, growth, and management metrics. After splitting the data into training and testing subsets, numeric variables were transformed using the Yeo–Johnson method to reduce the effect of extreme values and standardized to a common scale. Categorical variables, such as farm ID and health status, were converted into binary indicators through One-Hot Encoding. Two machine learning algorithms, Extreme Gradient Boosting (XGB) and Extra Trees (ET), were tuned via cross-validated grid searches, optimizing for Pearson correlation (r) to maximize agreement with observed mortality rates. After training the individual models, their predictions were combined through model blending, using a validation split to determine the optimal weighting between XGB and ET without accessing the test data. Isotonic Regression calibration was applied to improve probability reliability. Model performance was evaluated on an independent test set (30% hold-out portion of the dataset reserved for final evaluation), using correlation, accuracy, positive predictive value (PPV), recall, F1 score, and negative predictive value (NPV), with a fixed decision threshold of 0.20. Results and Discussion Individually, the XGB and ET models achieved Pearson correlations of 0.646 and 0.552, with corresponding F1 scores of 0.826 for both models. The XGB model demonstrated the best overall balance between sensitivity (0.806) and PPV (0.847), reaching an accuracy of 0.755 and NPV of 0.556. In contrast, the ET model showed a more aggressive positive bias, yielding higher sensitivity (0.961) but lower specificity and NPV (0.333). The blended ensemble outperformed both individual models, achieving a Pearson correlation of 0.672, accuracy of 0.823, PPV of 0.702, recall of 0.750, F1 score of 0.725, and negative predictive value (NPV) of 0.884 on the independent test set. These results demonstrate a strong overall balance between sensitivity and specificity, indicating that the calibrated ensemble effectively distinguishes high-risk from low-risk nursery groups. Isotonic calibration improved probability reliability without reducing correlation strength, producing more trustworthy mortality estimates. These findings highlight the value of machine-learning forecasting in swine production. By identifying high-risk nursery groups early, farms can prioritize monitoring, optimize resources, and apply targeted interventions, ultimately reducing losses and enhancing welfare.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mateus de Castro Duarte Cardoso
Iowa State University
Thinh Tien
Iowa State University
Jackson C Sterle
Iowa State University
Journal of Animal Science
Iowa State University
Building similarity graph...
Analyzing shared references across papers
Loading...
Cardoso et al. (Wed,) studied this question.
synapsesocial.com/papers/69fecfcdb9154b0b82876bd1 — DOI: https://doi.org/10.1093/jas/skag107.010
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: