Egg production is a key economic trait in poultry breeding, and its longitudinal records are essential for accurate genetic evaluation. However, time-series egg production data often contain missing values due to sensor failure, recording interruption, or operational error. Such missingness not only reduces the accuracy of phenotypic reconstruction but may also affect downstream breeding-value estimation. This study aimed to compare multiple imputation strategies for missing egg production time-series data and to evaluate them from two complementary perspectives: imputation accuracy and downstream genomic and pedigree-based prediction performance. We benchmarked six imputation strategies—Forward–Backward Mean, piecewise linear regression, spline regression, K-Nearest Neighbors, Random Forest (RF), and Long Short-Term Memory (LSTM) networks—using datasets from 4,390 yellow-feathered broilers, encompassing more than 100,000 egg production records and 463,000 SNPs. In terms of imputation accuracy, RF consistently outperformed the alternative methods across simulated missingness rates of 5%, 10%, 15%, and 20%, reducing RMSE by 15.50% – 31.22% and MSE by 28.31% – 52.69%, while improving R2 by 7.42% – 23.41% relative to the other methods. In downstream evaluation, the accuracy of Genomic Estimated Breeding Values (GEBV) without imputation ranged from 0.239 to 0.277, whereas imputation improved it to 0.288 – 0.293. The Random Forest method emerged as the most robust approach, delivering significant accuracy improvements of 5.78% to 21.76% (p < 0.05). This study demonstrates that Random Forest imputation is a highly effective tool for resolving missing data challenges in egg production time-series. By bridging the gap between raw data processing and genomic prediction, these findings provide a practical computational framework for enhancing the reliability of breeding programs in the poultry industry and other livestock species with longitudinal data.
Lan et al. (Wed,) studied this question.