Obstructive airway disease is defined by reduced FEV₁ and an FEV₁/FVC ratio below 70%. While pulmonary function testing is essential, few studies have used demographic, biochemical, and lifestyle data to predict disease risk in non-smoker group. This study aimed to develop interpretable machine learning (ML) models for early risk prediction and clinical screening. We analyzed data from 81,055 non-smoking individuals drawn from a health screening cohort of 549,825 participants. Six ML algorithms including CART, RF, XGBoost, LightGBM, CatBoost, and Lasso were applied to develop predictive models. All models demonstrated strong predictive performance, and an ensemble feature aggregation approach was used to identify key predictors. A CART was built with the identified features to generate a visualized decision tree, generating decision rules to support clinical screening. The key predictors included age, waist-to-hip ratio, blood pressure, and biochemical markers. This is the first large-scale ML study predicting obstructive airway disease in non-smoker group using health exam data. Interpretable models may assist early detection and clinical risk stratification.
Chang et al. (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: