Male infertility accounts for approximately 40–50% of all infertility cases worldwide, with semen analysis remaining the cornerstone of diagnosis. Manual semen evaluation is limited by substantial inter- and intra-observer variability, particularly for sperm morphology assessment, which has motivated the development of artificial intelligence (AI)-based automated systems. To systematically evaluate the technical performance of AI-based automated semen analysis systems, specifically for sperm morphology classification and motility assessment compared to manual expert evaluation, and to appraise the quality of evidence and identify critical gaps in clinical validation. DNA fragmentation prediction and zona pellucida (ZP) binding capability prediction is additionally examined as emerging, non-standard applications. A systematic search of PubMed, Scopus, and Web of Science (January 2015 – November 2025) was conducted following PRISMA 2020 guidelines. Studies comparing deep learning or machine learning methods to manual expert assessment or conventional CASA systems for human semen analysis were included. Risk of bias was assessed using QUADAS-2, RoB 2, and ROBINS-I; evidence certainty was graded using GRADE methodology. It should be noted that manual WHO semen analysis, used as the reference standard across most included studies, carries well-documented reproducibility limitations, particularly for morphology, which constrains the interpretability of AI performance metrics. Eighteen eligible studies (15 observational, 3 validation) were identified. Deep learning models, particularly ResNet-50 variants with attention mechanisms (CBAM) and ensemble strategies, achieved morphology classification accuracy of 55–96.77% (mean: 82.4%). CNN-based motility assessment demonstrated strong correlation with manual WHO classification (Pearson r = 0.88–0.89). Emerging AI applications for DNA fragmentation prediction (MAE: 0.05–0.10) and ZP binding capability (accuracy: 96.7%) showed technical feasibility but limited clinical outcome validation. One multi-centre study demonstrated model performance consistency across laboratory sites; however, robust external validation across diverse settings remains limited. GRADE certainty of evidence ranged from low to moderate across outcomes. AI-based automated semen analysis demonstrates promising technical performance for sperm morphology and motility assessment; however, the current evidence base does not establish AI-based semen analysis as a validated diagnostic or prognostic tool for male infertility. In the absence of prospective, outcome-driven validation studies, these systems should be considered investigational. High technical accuracy relative to a manual reference standard, itself subject to reproducibility limitations, does not confirm clinical validity or improvement in patient outcomes. Future research must prioritise prospective multi-centre validation, standardised training datasets, and randomised controlled trials evaluating the effect of AI-guided sperm selection on clinically meaningful endpoints, including fertilisation rates, implantation success, and live birth outcomes.
Yadav et al. (Thu,) studied this question.