Progressive pulmonary fibrosis (PPF) in interstitial lung disease (ILD) is a high-mortality phenotype of ILD that poses diagnostic challenges in resource-limited settings lacking advanced imaging and can require invasive diagnostic procedures. We aimed to develop a machine learning model for PPF-ILD diagnosis using routine blood parameters and the biomarker Krebs von den Lungen-6 (KL-6). Data from 10,687 ILD patients (4399 stable, 6288 PPF-ILD) at the First Affiliated Hospital of Guangzhou Medical University (2016-2025) were divided into training (January 2016-October 2022) and temporal validation (November 2022-July 2025) cohorts. Significant variables were identified via univariable logistic regression; 12 algorithms generated 130 models evaluated by area under the curve (AUC), calibration, and decision curve analysis (DCA). The Lasso + random forest (RF) model (20 variables) achieved an AUC of 0.998 in training and 0.842 in validation; glmBoost + RF (10 variables) yielded an AUC of 0.996 in training and 0.831 in validation, a sensitivity of 90.0%, a specificity of 61.0%, and an F1 score of 83.3%. Both models exhibited excellent calibration and DCA net benefit. KL-6 was the strongest predictor (OR = 6.20, 95% CI = 5.67-6.79). This streamlined model offers performance comparable to the more complex Lasso + RF model but with superior clinical applicability, providing an objective, noninvasive tool for early PPF-ILD detection in resource-constrained environments.
Chen et al. (Wed,) studied this question.