What is the clinical evidence from this study?

Study design: Cohort. Population: Chronic Obstructive Pulmonary Disease (COPD) (n=1480000). Intervention: COPD-GPT vs. Random Forest and Logistic Regression. Primary outcome: All-cause mortality within 5 years (AUROC 0.923, 95% CI 0.915-0.932).

What does this research mean for the field?

The transformer-based model COPD-GPT outperforms traditional machine learning and regression methods in predicting 5-year mortality and 1-year exacerbation in patients with chronic obstructive pulmonary disease. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to develop a transformer-based model, COPD-GPT, to predict mortality and exacerbation in COPD patients using publicly available data.

May 20, 2026

B16-10 COPD-Gpt: Transformer-Based Prediction of Exacerbation and Mortality in Chronic Obstructive Pulmonary Disease

Resultado clave

COPD-GPT outperformed Random Forest and Logistic Regression in predicting 5-year mortality (AUROC 0.923; 95% CI 0.915-0.932) and 1-year exacerbation (AUROC 0.885) in COPD patients.

Puntos clave

This research aims to develop a transformer-based model, COPD-GPT, to predict mortality and exacerbation in COPD patients using publicly available data.
Data extracted from CDC WONDER, NHANES, and MIMIC-IV.
Utilized structured and unstructured data embedded through transformer encoders, trained with supervised learning.
Model performance assessed against Random Forest and Logistic Regression methods.
COPD-GPT achieved AUROC of 0.923 for 5-year mortality, outperforming Random Forest (0.831) and Logistic Regression (0.776).
For 1-year exacerbation, AUROC was 0.885, with a 33% reduction in false-negative predictions compared to GOLD classification.
Projected 17.4% reduction in hospitalization and 0.9-year improvement in survival for high-risk patients.

Diseño del estudio

Tipo

Cohort (n=1,480,000)

Multicéntrico

Sí

PICO estructurado

Does COPD-GPT improve the prediction of severe exacerbation and 5-year mortality in patients with COPD compared to traditional machine learning and regression methods?

Población

1.48 million participants from publicly accessible datasets (CDC WONDER, NHANES, MIMIC-IV) with Chronic Obstructive Pulmonary Disease (COPD), mean age 61.4 ± 11.8 years, 55% male.

Intervención

COPD-GPT, a generative transformer model trained on structured data and unstructured discharge summaries

Comparador

Random Forest, Logistic Regression baselines, and GOLD stage classification

Resultado

Severe exacerbation within 12 months and all-cause mortality within 5 yearshard clinical

A transformer-based model, COPD-GPT, outperformed traditional machine learning and regression methods in predicting 5-year mortality and 1-year exacerbations in COPD patients using publicly available datasets.

Resultado numérico

Estimación del efecto: AUROC 0.923 (95% CI 0.915-0.932)

Tasa de eventos absoluta: 0.923% vs 0.831%

Resumen

Abstract Background Chronic Obstructive Pulmonary Disease (COPD) causes more than 3.2 million deaths annually and ranks as the third leading cause of global mortality. Traditional risk indices such as BODE and GOLD lack generalizability across populations and data types. We developed COPD-GPT, a generative transformer model trained exclusively on publicly accessible datasets to predict acute exacerbation and 5-year mortality in COPD. Methods Mortality data were extracted from CDC WONDER (1999-2023); clinical, biochemical, and exposure variables from NHANES (1999-2020); and hospital-level validation from MIMIC-IV (v2.2, PhysioNet). Structured data (age, BMI, FEV1 %, smoking index, comorbidities) and unstructured discharge summaries were jointly embedded through transformer encoders fine-tuned using supervised learning. Training (80 %) and validation (20 %) cohorts were split chronologically. Endpoints were (1) severe exacerbation within 12 months and (2) all-cause mortality within 5 years. Model performance was compared with Random Forest and Logistic Regression baselines. Explainability was assessed using SHAP value decomposition and feature saliency. Results The training cohort included 1.48 million participants; mean age = 61.4 ± 11.8 years, 55 % male, mean FEV1 = 61.2 ± 14.5 %. Model performance: COPD-GPT: AUROC = 0.923 (95 % CI 0.915-0.932) for 5-year mortality, Random Forest: 0.831 (0.823-0.839), Logistic Regression: 0.776 (0.769-0.784), Calibration slope = 0.97, Hosmer-Lemeshow p = 0.48, 5-fold cross-validation accuracy = 89.6 %, precision = 0.87, recall = 0.88For 1-year exacerbation prediction, COPD-GPT achieved AUROC = 0.885 (0.872-0.898), reducing false-negative predictions by 33 % compared with GOLD stage classification.Key predictors included FEV1 decline rate (β = 0.41 ± 0.03, p 0.001), eosinophil count 350 cells/µL (OR 1.32 1.21-1.45), and PM2.5 exposure 35 µg/m³ (OR 1.26 1.14-1.38).Simulation of a risk-stratified care protocol guided by COPD-GPT projected a 17.4 % (95 % UI 12.6-21.9) reduction in hospitalization and a 0.9-year improvement in expected survival for the top-quartile risk group. Conclusions COPD-GPT outperforms traditional machine-learning and regression methods in forecasting mortality and exacerbation using only publicly available datasets. The model captures nonlinear interactions between lung function, biomarkers, and environmental exposure, offering transparent, reproducible, and globally deployable risk stratification.By integrating population-level data sources such as CDC WONDER, NHANES, and MIMIC-IV, COPD-GPT establishes a scalable foundation for AI-driven COPD surveillance and personalized intervention modeling. This abstract is funded by: NA

Me gusta

Guardar