What is the clinical evidence from this study?

Study design: Observational. Population: Dyspnea in individuals with a history of smoking (n=6300). Intervention: Explainable AI/ML dyspnea prediction model. Primary outcome: Presence of dyspnea (mMRC scale score ≥2) (AUROC 0.85).

What does this research mean for the field?

An explainable AI/ML model (LightGBMXT) accurately predicts the presence of dyspnea in individuals with a history of smoking, identifying key driving factors such as lung function, BMI, and mental health. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The study aims to create an explainable AI/ML model to identify the factors driving dyspnea in smokers.

May 20, 2026

C17-05 Artificial Intelligence/Machine Learning Dyspnea Prediction Model in Individuals With a History of Smoking

Key Result

An explainable AI/ML model (LightGBMXT) accurately predicted the presence of dyspnea in smokers, achieving an AUROC of 0.85 and an accuracy of 0.78 in the test dataset.

Key Points

The study aims to create an explainable AI/ML model to identify the factors driving dyspnea in smokers.
Used data from the COPDGene Study involving 6300 smokers, of which 2520 reported dyspnea.
Developed models using algorithms like LightGBM, CatBoost, and neural networks, with performance assessed via AUROC and other metrics.
Utilized an explainable AI framework (TreeSHAP) for individualized feature importance calculations.
LightGBMXT model achieved the highest AUROC of 0.85 and an F1 score of 0.69 in predicting dyspnea.
Key features associated with dyspnea included lower lung function, anxiety, depression, and chronic bronchitis.
The results indicate a significant potential for applying AI/ML in clinical settings for managing dyspnea in smokers.

Study Design

Type

Observational (n=6,300)

Multicenter

Yes

Structured PICO

Can explainable AI/ML models accurately predict the presence of dyspnea and identify its driving factors in individuals with a history of smoking?

Population

6300 current and former non-Hispanic White and African American smokers from the multi-center, prospective observational COPDGene Study. 40% (N = 2520) reported dyspnea (mMRC score ≥2).

Intervention

Explainable AI/ML models (including LightGBMXT, CatBoost, XGBoost, NeuralNetFastAI) using clinical history, spirometry/DLCO, and quantitative CT imaging data

Outcome

Prediction of the presence of dyspnea (defined as a self-reported modified Medical Research Council (mMRC) dyspnea scale score ≥2)

Explainable AI/ML models, particularly LightGBMXT, can accurately predict dyspnea in smokers using clinical, spirometric, and imaging data, identifying key driving factors like lung function and BMI.

Main Result

Effect estimate: AUROC 0.85

Abstract

Abstract Rationale Tobacco use affects ∼1. 25 billion adults worldwide and remains the leading cause of multimorbidity, with ∼40% of individuals with a smoking history (smokers) experiencing dyspnea. While dyspnea in smokers is a life-limiting symptom that restricts daily, social, and occupational participation, it remains underdiagnosed and undertreated in smokers. The leading cause of the clinical gap in dyspnea assessment and management is uncertainty in identifying its most likely driving factors in the individual. Aims This study aims to develop an explainable AI/ML model and quantify individual feature attributions to identify the most likely driving factors of dyspnea in each smoker. Methods This predictive modeling study used phase 2 clinical history, spirometry/DLCO, quantitative CT imaging data from the COPDGene Study, a multi-center, prospective observational study of non-Hispanic White and African American smokers. Among 6300 current and former smokers, 40% (N = 2520) reported dyspnea, defined as a self-reported modified Medical Research Council (mMRC) dyspnea scale score ≥2. The dataset was randomly split into training (80%) and test (20%) sets to develop and validate dyspnea prediction models. Model development used tree-based ensemble algorithms (Light Gradient-Boosting Machine (LightGBM), LightGBM with eXtended Training (LightGBMXT), CatBoost, and eXtreme Gradient Boosting (XGBoost) as well as a weighted ensemble (Weighted EnsembleL2) and a neural network (NeuralNetFastAI). Model performance was evaluated on the test set using area under the receiver operating characteristic curve (AUROC), F1 score, precision, recall, and accuracy. AutoGluon’s internal leaderboard ranked models using AUROC on the test data. An explainable AI/ML framework based on TreeSHAP computed individualized feature importance values for each prediction. Results Among all models, LightGBMXT achieved the highest AUROC of 0. 85, an F1 score of 0. 69, a precision of 0. 75, a recall of 0. 65, and an accuracy of 0. 78 in the test dataset. These results indicate balanced predictive value for the presence of dyspnea in smokers. TreeSHAP identified the top features associated with dyspnea: lower Global Lung Function Initiative (GLI) -predicted % Forced Expiratory Volume in 1-second, lower diffusing capacity for carbon monoxide, higher body mass index, the presence of depression and/or anxiety, and the presence of chronic bronchitis. Conclusions Our findings suggest that explainable AI/ML models can accurately predict the presence of dyspnea in smokers, indicating excellent predictive capability. Using the most likely driving factors of dyspnea we identified, future work should focus on identifying clinically relevant dyspnea subtypes based on clustering subgroups with similar driving factors and on external validation in an independent cohort. This abstract is funded by: This work was supported by NHLBI grants U01 HL089897 and U01 HL089856 and by NIH contract 75N92023D00011.

Bookmark

Cite This Study

Shin et al. (Fri,) conducted a observational in Dyspnea in individuals with a history of smoking (n=6,300). Explainable AI/ML dyspnea prediction model was evaluated on Presence of dyspnea (mMRC scale score ≥2) (AUROC 0.85). An explainable AI/ML model (LightGBMXT) accurately predicted the presence of dyspnea in smokers, achieving an AUROC of 0.85 and an accuracy of 0.78 in the test dataset.

synapsesocial.com/papers/6a0d5122f03e14405aa9d7ec https://doi.org/https://doi.org/10.1093/ajrccm/aamag162.3081

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark