COPD-GPT outperformed Random Forest and Logistic Regression in predicting 5-year mortality (AUROC 0.923; 95% CI 0.915-0.932) and 1-year exacerbation (AUROC 0.885) in COPD patients.
Cohort (n=1,480,000)
Sí
Does COPD-GPT improve the prediction of severe exacerbation and 5-year mortality in patients with COPD compared to traditional machine learning and regression methods?
A transformer-based model, COPD-GPT, outperformed traditional machine learning and regression methods in predicting 5-year mortality and 1-year exacerbations in COPD patients using publicly available datasets.
Estimación del efecto: AUROC 0.923 (95% CI 0.915-0.932)
Tasa de eventos absoluta: 0.923% vs 0.831%
Abstract Background Chronic Obstructive Pulmonary Disease (COPD) causes more than 3.2 million deaths annually and ranks as the third leading cause of global mortality. Traditional risk indices such as BODE and GOLD lack generalizability across populations and data types. We developed COPD-GPT, a generative transformer model trained exclusively on publicly accessible datasets to predict acute exacerbation and 5-year mortality in COPD. Methods Mortality data were extracted from CDC WONDER (1999-2023); clinical, biochemical, and exposure variables from NHANES (1999-2020); and hospital-level validation from MIMIC-IV (v2.2, PhysioNet). Structured data (age, BMI, FEV1 %, smoking index, comorbidities) and unstructured discharge summaries were jointly embedded through transformer encoders fine-tuned using supervised learning. Training (80 %) and validation (20 %) cohorts were split chronologically. Endpoints were (1) severe exacerbation within 12 months and (2) all-cause mortality within 5 years. Model performance was compared with Random Forest and Logistic Regression baselines. Explainability was assessed using SHAP value decomposition and feature saliency. Results The training cohort included 1.48 million participants; mean age = 61.4 ± 11.8 years, 55 % male, mean FEV1 = 61.2 ± 14.5 %. Model performance: COPD-GPT: AUROC = 0.923 (95 % CI 0.915-0.932) for 5-year mortality, Random Forest: 0.831 (0.823-0.839), Logistic Regression: 0.776 (0.769-0.784), Calibration slope = 0.97, Hosmer-Lemeshow p = 0.48, 5-fold cross-validation accuracy = 89.6 %, precision = 0.87, recall = 0.88For 1-year exacerbation prediction, COPD-GPT achieved AUROC = 0.885 (0.872-0.898), reducing false-negative predictions by 33 % compared with GOLD stage classification.Key predictors included FEV1 decline rate (β = 0.41 ± 0.03, p 0.001), eosinophil count 350 cells/µL (OR 1.32 1.21-1.45), and PM2.5 exposure 35 µg/m³ (OR 1.26 1.14-1.38).Simulation of a risk-stratified care protocol guided by COPD-GPT projected a 17.4 % (95 % UI 12.6-21.9) reduction in hospitalization and a 0.9-year improvement in expected survival for the top-quartile risk group. Conclusions COPD-GPT outperforms traditional machine-learning and regression methods in forecasting mortality and exacerbation using only publicly available datasets. The model captures nonlinear interactions between lung function, biomarkers, and environmental exposure, offering transparent, reproducible, and globally deployable risk stratification.By integrating population-level data sources such as CDC WONDER, NHANES, and MIMIC-IV, COPD-GPT establishes a scalable foundation for AI-driven COPD surveillance and personalized intervention modeling. This abstract is funded by: NA
Qamar et al. (Fri,) conducted a cohort in Chronic Obstructive Pulmonary Disease (COPD) (n=1,480,000). COPD-GPT vs. Random Forest and Logistic Regression was evaluated on All-cause mortality within 5 years (AUROC 0.923, 95% CI 0.915-0.932). COPD-GPT outperformed Random Forest and Logistic Regression in predicting 5-year mortality (AUROC 0.923; 95% CI 0.915-0.932) and 1-year exacerbation (AUROC 0.885) in COPD patients.