Lumbar segmental instability is characterized by abnormal vertebral movement, resulting in localized pain and restricted function. Despite its clinical significance, the lack of diagnostic standards leads to widespread practice variability. This review critically analyzes existing diagnostic protocols, identifies methodological shortcomings, and synthesizes multimodal diagnostic strategies. Adhering to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, systematic searches across PubMed, Scopus, Web of Science, Cochrane, and Embase (inception to December 2025) targeted human diagnostic and observational studies, explicitly excluding studies that used artificial intelligence (AI) algorithms. Of the 657 records, 12 met the eligibility criteria. Quality was appraised using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), Newcastle-Ottawa Scale (NOS), and Prediction Model Risk of Bias Assessment Tool (PROBAST). Given the clinical heterogeneity, a thematic narrative synthesis was conducted following the Synthesis Without Meta-analysis (SWiM) guidelines. Findings confirm no single test is adequate. The Passive Lumbar Extension and Lumbar Rocking tests demonstrated strong screening potential (sensitivities: 84.2% and 95.56%; specificities: 90.4% and 40%). Clustered clinical testing significantly improved precision (LR+ 5.80). For imaging, dynamic assessments, such as 3-kg weight-lifting flexion radiographs (88% detection at L3/L4) and sit-to-stand kinematics, outperformed static images. Advanced predictive models using 3D-CT and least absolute shrinkage and selection operator (LASSO) regression achieved exceptional theoretical accuracy (area under the curve (AUC) 0.972); however, alongside tools such as the Jakarta Instability Score, they carry a high risk of statistical bias and currently lack independent external validation. Diagnosing lumbar instability requires a comprehensive multimodal strategy. Current evidence precision is heavily constrained by small cohort sizes and a pervasive lack of 95% confidence intervals. Future research must prioritize externally validated predictive models and standardized imaging reference criteria.
Thomas et al. (Wed,) studied this question.