Cronobacter sakazakii is an opportunistic pathogen commonly associated with powdered infant formula and causes severe neonatal infections. While whole-genome sequencing (WGS)-based single nucleotide polymorphism (SNP) analysis has revolutionized surveillance and outbreak investigations, comprehensive population-level analyses remain limited, and establishing proper thresholds for detecting epidemiologically related C. sakazakii isolates requires assessment using large-scale genomic datasets. We analyzed 1870 C. sakazakii genomes from the United States (1970–2025) to examine pan- and core-genomic structure, analyze SNP distance matrices encompassing 1,747,515 unique pairwise comparisons, and reconstruct population phylogeny. Our analyses revealed exceptional genomic diversity with a large pan-genome of 24,035 gene families and an average of 29,442 ± 13,097 SNPs between genome pairs. Phylogenetic reconstruction identified 22 major clusters encompassing 89.3% of genomes, including environmental complexes demonstrating persistent contamination spanning multiple years. Using 209 monophyletic genome pairs with concordant metadata, we propose a tiered SNP threshold framework (≤234 to 506 SNPs) for detecting potentially epidemiologically-related genomes with improved sensitivity. As genomes from Michigan comprised 39.3% of the dataset, these thresholds should be interpreted with caution when applied to other US regions. This study provides population genomics infrastructure to enhance C. sakazakii surveillance and traceback studies for improving powdered infant formula safety.
Zhang et al. (Thu,) studied this question.