Accurate pedigree reconstruction is essential for genetic evaluation, inbreeding control, and family management in aquaculture breeding programs. In sturgeon, extremely high fecundity and communal rearing during early developmental stages often lead to the loss of family information, making reliable full-sib family assignment a critical challenge. In this study, we developed a machine learning-based framework for full-sib family assignment using simulated genotype datasets and whole-genome resequencing data from Russian sturgeon. Simulation analyses across five machine learning algorithms showed that training set size and marker density were the primary determinants of assignment accuracy. When at least 10 individuals per family were included in the training set, mean identification accuracy exceeded 99% across all evaluated scenarios, and exceeded 99.9% for all methods except XGBoost. In contrast, performance declined when the marker number was reduced to 200. At moderate marker densities (500–1000 markers), performance remained stable, with mean identification accuracy around 99% even when only 3–4 individuals per family were included in the training set. Validation using whole-genome resequencing data (sequencing depth ranging from 9.43× to 11.86×) from 582 individuals representing 19 full-sib families of Russian sturgeon confirmed the simulation findings, with several algorithms achieving assignment accuracies exceeding 99%. These results demonstrate that machine learning provides an accurate and robust approach for full-sib family assignment using genome-wide single nucleotide polymorphism (SNP) data. The proposed framework offers an effective solution for pedigree reconstruction and family identification in sturgeon breeding populations lacking reliable pedigree records.
Yan et al. (Thu,) studied this question.