Abstract Purpose The purpose of this systematic review was to compare the inter‐ and intra‐observer reliability of Kellgren–Lawrence (KL) grading versus minimum joint space width (mJSW) measurement on knee radiographs, and to examine the influence of rater type, radiographic view and atlas‐based training. Methods A systematic search of PubMed/MEDLINE, Embase, Scopus and Web of Science was conducted through July 2025 following Preferred Reporting Items for Systematic reviews and Meta‐Analyses (PRISMA) guidelines. Studies reporting inter‐ and/or intra‐observer reliability for KL or mJSW on standard knee radiographs were included. Extracted data included kappa ( κ ) and intraclass correlation coefficient (ICC) values, rater expertise, radiographic view and atlas usage. Raters were categorized as expert, trainee/non‐expert or artificial intelligence (AI)‐assisted/automated. Results Twenty‐four studies (10,394 radiographs) met the inclusion criteria. Inter‐observer reliability was generally higher for mJSW (mean ICC = 0.82 ± 0.17) compared with KL (mean k = 0.72 ± 0.17), with similar trends observed for intra‐observer reliability. Experts and AI‐assisted systems outperformed trainees/non‐experts for both metrics. Reliability improved with semiflexed posteroanterior/Rosenberg views and atlas‐guided training. Conclusion This systematic review demonstrates that mJSW measurement generally shows higher reproducibility than KL grading. Reliability improved with expertise, standardized imaging and structured training, and AI‐assisted approaches performed comparably to expert raters. Quantitative mJSW should complement KL grading, particularly when combined with alignment measures, to enhance consistency and clinical relevance of radiographic assessment. Level of Evidence Level III.
Khela et al. (Tue,) studied this question.