A Random Forest model using pre-bronchodilator spirometry Z-scores and smoking status successfully screened for COPD with an accuracy of 83.1% (95% CI: 71.0%-91.6%; p=0.027 vs No Information Rate).
Cross-Sectional (n=399)
Does a machine learning model based on pre-bronchodilator spirometry Z-scores and smoking status accurately screen for COPD?
A machine learning model using pre-bronchodilator spirometry Z-scores and smoking status demonstrated strong accuracy and high specificity for noninvasive COPD screening.
p-value: p=0.027
Abstract Rationale Chronic obstructive pulmonary disease (COPD) is a major worldwide health challenge, and it is frequently underdiagnosed, especially in areas where post-bronchodilator spirometry is not routinely conducted. Machine learning (ML) presents a promising way to improving COPD screening with pre-bronchodilator data. The study aims to construct and assess an ML model for COPD screening based on pre-bronchodilator spirometry Z-scores generated from the Global Lung Initiative (GLI) equations, with FEF25–75% and smoking status as predictive factors. Methods 399 subjects with complete pre- and post-bronchodilator spirometry measures were included in the National Health and Nutrition Examination Survey (NHANES, cycle G) data. Z-scores were produced by standardizing the spirometric indices (FEV1, FVC, FEV1/FVC, and FEF25–75%) using GLI reference equations. A post-bronchodilator FEV1/FVC Z-score below -1.645, which represents the lower limit of normal, was used to define COPD. Z-scores from pre-bronchodilator spirometry and smoking status were used to train a Random Forest classifier with an 80/20 train-test split. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), Cohen’s Kappa, and No Information Rate (NIR) were used to assess the model’s performance. Results The Random Forest model greatly outperformed the NIR (p = 0.027) with an accuracy of 83.1% (95% CI: 71.0%-91.6%). The model showed a sensitivity of 64.7%, capturing the majority of actual COPD patients, and a specificity of 90.5%, successfully identifying people without COPD. The NPV was 86.4% and the PPV was 73.3%. The degree of agreement between the actual and anticipated diagnoses was moderate (Kappa = 0.572). Conclusion This study shows that machine learning models trained on GLI-based pre-bronchodilator spirometry Z-scores may successfully screen for COPD using the LLN threshold (Z-score -1.645) in conjunction with FEF25–75% and smoking status. Strong performance and high specificity were demonstrated by the Random Forest model, underscoring the potential of ML-based techniques for noninvasive COPD screening. Nevertheless, the results are based on a small dataset, and in order to improve model generalizability and therapeutic utility, validation using larger, high quality datasets, and more varied data—especially from primary care and community settings—is crucial. This abstract is funded by: None
Almeshari et al. (Fri,) conducted a cross-sectional in Chronic obstructive pulmonary disease (COPD) (n=399). Random Forest classifier using pre-bronchodilator spirometry Z-scores and smoking status vs. No Information Rate (NIR) was evaluated on Model accuracy for COPD screening (95% CI 71.0%-91.6%, p=0.027). A Random Forest model using pre-bronchodilator spirometry Z-scores and smoking status successfully screened for COPD with an accuracy of 83.1% (95% CI: 71.0%-91.6%; p=0.027 vs No Information Rate).