Background: Pneumonia of unknown etiology (PUE), defined as pneumonia cases without an identified pathogen at the time of clinical presentation, represents a critical clinical warning signal for emerging infectious disease (EID) outbreaks with pandemic potential. Yet, conventional pathogen-centric surveillance systems suffer from an inherent blind spot: they cannot detect early clustering signals before the causative agent is identified, creating a window of vulnerability during novel pathogen emergence. To address this gap, this study aims to develop a deep learning model that leverages unstructured chest imaging text—a routinely available clinical data stream—to enable real-time, automated screening of PUE cases and early warning of EID clusters, independent of prior pathogen knowledge, within an integrated multi-pathogen surveillance framework. Methods: We retrospectively collected data from 8860 patients with respiratory illnesses at a tertiary hospital in Beijing, China, including 980 PUE cases (11.1%) and 7880 known-etiology pneumonia cases. A deep learning model (RoBERTa with attention enhancement) was developed using unstructured chest imaging reports. The Matthews correlation coefficient (MCC) curve was employed to determine the optimal decision threshold. Model performance was assessed for PUE case identification and clustering signal detection on a test set. Results: The model achieved an area under the receiver operating characteristic curve of 0.986 (95% CI: 0.981–0.991). At the optimal threshold of 0.08, selected by maximizing the Matthews correlation coefficient (MCC)—a balanced metric that accounts for all four confusion matrix outcomes—sensitivity was 89.8%, and specificity was 97.0% for identifying PUE cases. In a simulated surveillance exercise, the model showed a high correlation between the predicted and actual case counts (Pearson’s r = 0.901), suggesting its potential to detect abnormal clustering signals prior to pathogen identification. Conclusions: The developed model demonstrates potential to detect clustering signals of PUE caused by unknown pathogens and can be integrated with hospital information systems, providing a feasible, low-cost tool for integrated surveillance of pathogens with pandemic potential. This approach enables earlier outbreak detection and supports public health decision-making during the critical window before pathogen identification.
Yang et al. (Fri,) studied this question.