Background The success of endodontic procedure depends on the precise determination of the working length (WL), which is measured from a coronal reference point to the apical constriction of the root canal. Accurate WL measurement ensures thorough debridement, effective disinfection, and optimal obturation, thereby preventing complications such as postoperative pain, over instrumentation, and persistent infection. Given the emerging use of artificial intelligence (AI)-based computational models in endodontics, this systematic review aimed to evaluate the performance of AI models developed to assist in determining WL length and identifying related apical landmarks in endodontic procedures. Methods A comprehensive search of PubMed, Scopus, Embase, Cochrane Library, Web of Science, and Google Scholar was conducted from January 1, 2000, to July 31, 2025. Eligible studies included those evaluating machine learning or neural network–based models for WL assessment. Methodological quality was appraised using QUADAS-2, with explicit differentiation between risk of bias (internal validity) and applicability (external validity). The certainty of evidence was assessed using the GRADE approach. Results Six studies met the eligibility criteria. Substantial heterogeneity was observed in algorithm types (e.g., artificial neural networks, ensemble machine learning models), input modalities (radiographic vs. impedance-based), reference standards, and validation strategies. Most studies relied on retrospective or in vitro datasets with internal validation only; no study reported prospective, real-world external validation. The QUADAS-2 assessment identified concerns related to patient selection and applicability, particularly in studies using extracted teeth or experimental datasets. According to GRADE, the overall certainty of the evidence was low. Reported performance metrics varied, with sensitivity ranging from 0.85–1.00, specificity from 0.50–1.00, and accuracy from 0.70–0.95. However, comparisons across studies were limited by inconsistent outcome definitions, the absence of standardized clinical error thresholds (e.g., ±0.5 mm), and infrequent reporting of confidence intervals. Conclusion AI-based models show preliminary and investigational potential as adjunctive tools for WL determination. However, the current evidence is limited by methodological heterogeneity, reliance on non-clinical datasets, and a lack of external validation. None of the included studies provide high-certainty evidence from prospective, real-world clinical trials. Therefore, AI systems should currently be considered adjunctive and experimental rather than clinically established. Future research should prioritize prospective, multicenter clinical validation, standardized outcome definitions, and transparent reporting to enhance generalizability and clinical applicability. Systematic Review Registration https://www.crd.york.ac.uk/PROSPERO/view/CRD420251231561 . PROSPERO CRD420251231561.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sanjeev B. Khanagar
King Saud bin Abdulaziz University for Health Sciences
Majed Alharthi
King Saud bin Abdulaziz University for Health Sciences
Adel Asery
King Saud bin Abdulaziz University for Health Sciences
Frontiers in Dental Medicine
SHILAP Revista de lepidopterología
King Saud bin Abdulaziz University for Health Sciences
King Abdullah International Medical Research Center
National Guard Health Affairs
Building similarity graph...
Analyzing shared references across papers
Loading...
Khanagar et al. (Fri,) studied this question.
synapsesocial.com/papers/69ada873bc08abd80d5bb60d — DOI: https://doi.org/10.3389/fdmed.2026.1783828
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: