What question did this study set out to answer?

To develop and validate an automated deep learning system for measuring lower-limb alignment on radiographs and to compare tibial joint-line definitions.

May 28, 2026Open Access

Automated assessment of coronal lower extremity alignment on long-leg radiographs using a deep-learning model: validation, efficiency gains, and superiority of an edge-based tibial joint-line definition

Key Points

To develop and validate an automated deep learning system for measuring lower-limb alignment on radiographs and to compare tibial joint-line definitions.
A retrospective analysis of 309 long-leg radiographs was conducted, divided into training (60%), validation (20%), and testing (20%) sets.
Two tibial joint-line definitions were compared for accuracy in measurement using YOLOv5 to detect landmarks and calculate parameters.
Bland-Altman analysis assessed measurement bias against expert consensus annotations.
Absolute errors for knee parameters ranged from 0.07° to 0.45% on an external dataset.
Automated measurement reduced runtime to 24.3 ± 0.7 seconds, achieving an 89-91% reduction compared to manual methods (p < 0.001).
Method 2 outperformed Method 1 with significantly smaller absolute errors for nearly all parameters (p ≤ 0.005).

Abstract

PURPOSE: To develop, validate, and benchmark a fully automated deep learning (DL) system that simultaneously measures 15 coronal lower-limb alignment parameters on standing long-leg radiographs (LLRs) and localizes deformity, and to compare two tibial joint-line definitions for suitability in DL-based measurement. METHODS: A retrospective set of 309 anteroposterior standing LLRs was split into training/validation/testing (60/20/20). External generalizability was assessed using 75 independent LLRs from a different scanner and patient cohort. YOLOv5 was used to detect multiple bony landmarks, followed by algorithmic calculation of 15 parameters (e.g., HKAA, mLDFA, mMPTA). Two tibial joint-line definitions were evaluated: Method 1 (line through medial/lateral lowest tibial plateau points) and Method 2 (line through most medial/lateral plateau edges). Accuracy, clinical failure rate (≥ 2° or ≥ 2%), and runtime were compared with expert consensus annotations. Bland-Altman analysis was added to assess measurement bias and clinical agreement. RESULTS: On the external dataset, absolute errors for knee phenotype-related parameters ranged from 0.07° to 0.45°. Automated analysis of all 15 parameters took 24.3 ± 0.7 s, reducing time by 89-91% versus manual measurement (p < 0.001). Method 2 produced significantly smaller absolute errors than Method 1 for nearly all parameters (p ≤ 0.005). Clinically significant failure rates were low (0-4.7%) and were significantly lower than an attending physician's for several key metrics on the external set. The distribution of varus/valgus/neutral alignment based on HKAA and extraarticular deformity locations were reported. CONCLUSION: This DL framework provides fast, comprehensive, and specialist-level coronal alignment assessment on LLRs. An edge-based tibial joint-line definition (Method 2) outperforms a lowest-point definition, improving precision and reliability for DL measurement pipelines, supporting clinically deployable orthopedic imaging AI. Future prospective studies are warranted to validate Method 2 against clinical outcomes including osteoarthritis progression and surgical results.

Mark Helpful

Bookmark

Relay

View Full Paper