An explainable AI/ML model (LightGBMXT) accurately predicted the presence of dyspnea in smokers, achieving an AUROC of 0.85 and an accuracy of 0.78 in the test dataset.
Observational (n=6,300)
Yes
Can explainable AI/ML models accurately predict the presence of dyspnea and identify its driving factors in individuals with a history of smoking?
Explainable AI/ML models, particularly LightGBMXT, can accurately predict dyspnea in smokers using clinical, spirometric, and imaging data, identifying key driving factors like lung function and BMI.
Effect estimate: AUROC 0.85
Abstract Rationale Tobacco use affects ∼1. 25 billion adults worldwide and remains the leading cause of multimorbidity, with ∼40% of individuals with a smoking history (smokers) experiencing dyspnea. While dyspnea in smokers is a life-limiting symptom that restricts daily, social, and occupational participation, it remains underdiagnosed and undertreated in smokers. The leading cause of the clinical gap in dyspnea assessment and management is uncertainty in identifying its most likely driving factors in the individual. Aims This study aims to develop an explainable AI/ML model and quantify individual feature attributions to identify the most likely driving factors of dyspnea in each smoker. Methods This predictive modeling study used phase 2 clinical history, spirometry/DLCO, quantitative CT imaging data from the COPDGene Study, a multi-center, prospective observational study of non-Hispanic White and African American smokers. Among 6300 current and former smokers, 40% (N = 2520) reported dyspnea, defined as a self-reported modified Medical Research Council (mMRC) dyspnea scale score ≥2. The dataset was randomly split into training (80%) and test (20%) sets to develop and validate dyspnea prediction models. Model development used tree-based ensemble algorithms (Light Gradient-Boosting Machine (LightGBM), LightGBM with eXtended Training (LightGBMXT), CatBoost, and eXtreme Gradient Boosting (XGBoost) as well as a weighted ensemble (Weighted EnsembleL2) and a neural network (NeuralNetFastAI). Model performance was evaluated on the test set using area under the receiver operating characteristic curve (AUROC), F1 score, precision, recall, and accuracy. AutoGluon’s internal leaderboard ranked models using AUROC on the test data. An explainable AI/ML framework based on TreeSHAP computed individualized feature importance values for each prediction. Results Among all models, LightGBMXT achieved the highest AUROC of 0. 85, an F1 score of 0. 69, a precision of 0. 75, a recall of 0. 65, and an accuracy of 0. 78 in the test dataset. These results indicate balanced predictive value for the presence of dyspnea in smokers. TreeSHAP identified the top features associated with dyspnea: lower Global Lung Function Initiative (GLI) -predicted % Forced Expiratory Volume in 1-second, lower diffusing capacity for carbon monoxide, higher body mass index, the presence of depression and/or anxiety, and the presence of chronic bronchitis. Conclusions Our findings suggest that explainable AI/ML models can accurately predict the presence of dyspnea in smokers, indicating excellent predictive capability. Using the most likely driving factors of dyspnea we identified, future work should focus on identifying clinically relevant dyspnea subtypes based on clustering subgroups with similar driving factors and on external validation in an independent cohort. This abstract is funded by: This work was supported by NHLBI grants U01 HL089897 and U01 HL089856 and by NIH contract 75N92023D00011.
Shin et al. (Fri,) conducted a observational in Dyspnea in individuals with a history of smoking (n=6,300). Explainable AI/ML dyspnea prediction model was evaluated on Presence of dyspnea (mMRC scale score ≥2) (AUROC 0.85). An explainable AI/ML model (LightGBMXT) accurately predicted the presence of dyspnea in smokers, achieving an AUROC of 0.85 and an accuracy of 0.78 in the test dataset.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: