What question did this study set out to answer?

This review aims to assess the reliability of minimum joint space width and Kellgren-Lawrence grading in knee radiographs.

April 16, 2026

Minimum joint space width demonstrates higher inter‐and intra‐observer reliability than Kellgren–Lawrence grading in knee osteoarthritis

Key Points

This review aims to assess the reliability of minimum joint space width and Kellgren-Lawrence grading in knee radiographs.
Conducted a systematic search of multiple databases following PRISMA guidelines.
Included studies with reliability metrics for KL or mJSW on knee radiographs.
Extracted data on observer type, radiographic views, and training methods.
Inter-observer reliability was higher for mJSW (mean ICC = 0.82) compared to KL (mean k = 0.72).
AI-assisted systems and experts outperformed trainees/non-experts in reliability metrics.
Reliability improved with specific radiographic views and structured atlas-guided training.

Abstract

Abstract Purpose The purpose of this systematic review was to compare the inter‐ and intra‐observer reliability of Kellgren–Lawrence (KL) grading versus minimum joint space width (mJSW) measurement on knee radiographs, and to examine the influence of rater type, radiographic view and atlas‐based training. Methods A systematic search of PubMed/MEDLINE, Embase, Scopus and Web of Science was conducted through July 2025 following Preferred Reporting Items for Systematic reviews and Meta‐Analyses (PRISMA) guidelines. Studies reporting inter‐ and/or intra‐observer reliability for KL or mJSW on standard knee radiographs were included. Extracted data included kappa ( κ ) and intraclass correlation coefficient (ICC) values, rater expertise, radiographic view and atlas usage. Raters were categorized as expert, trainee/non‐expert or artificial intelligence (AI)‐assisted/automated. Results Twenty‐four studies (10,394 radiographs) met the inclusion criteria. Inter‐observer reliability was generally higher for mJSW (mean ICC = 0.82 ± 0.17) compared with KL (mean k = 0.72 ± 0.17), with similar trends observed for intra‐observer reliability. Experts and AI‐assisted systems outperformed trainees/non‐experts for both metrics. Reliability improved with semiflexed posteroanterior/Rosenberg views and atlas‐guided training. Conclusion This systematic review demonstrates that mJSW measurement generally shows higher reproducibility than KL grading. Reliability improved with expertise, standardized imaging and structured training, and AI‐assisted approaches performed comparably to expert raters. Quantitative mJSW should complement KL grading, particularly when combined with alignment measures, to enhance consistency and clinical relevance of radiographic assessment. Level of Evidence Level III.

Bookmark

Minimum joint space width demonstrates higher inter‐and intra‐observer reliability than Kellgren–Lawrence grading in knee osteoarthritis

Key Points

Abstract

Cite This Study