In this study we evaluate the performance of classical machine learning models, and transformer?based models (BioClinicalBERT and large language models (LLMs) for the automatic identification of COPD exacerbations using unstructured clinical texts from Colombian electronic health records (EHRs). The dataset included 5,924 outpatient notes written in Spanish and manually labeled by a pulmonologist, incorporating two key fields: “subjective” (patient-reported symptoms) and “analysis” (physician assessment). This work addresses a critical gap, as no prior models have been developed or validated for exacerbation detection in Spanish-language clinical texts. Results show that classical models—especially LightGBM and CatBoost—achieved the best balance between discrimination and calibration (ECE < 1.3%), outperforming BioClinicalBERT and LLMs. These findings support the use of supervised, domain-adapted models for decision support in resource-limited clinical settings
Lozano et al. (Fri,) studied this question.