e20005 Background: Lung cancer remains the leading cause of cancer-related mortality worldwide, largely due to late-stage diagnosis. Although low-dose computed tomography (LDCT) enables early detection, its widespread implementation is limited by cost, resource availability, and access disparities. This retrospective study aimed to develop a machine learning model using complete blood count (CBC) tests as a low-cost tool for lung cancer risk stratification. Methods: We analyzed CBC tests from 53,093 individuals (30,313 females, 57.10%; 22,780 males, 42.90%) 50 years and older who underwent chest CT or biopsy within six months of blood testing in Grupo Fleury laboratory, Brazil. The study population was retrospectively assembled from real-world clinical data. Low-risk CT findings were used as controls (36,243 for training and 15,535 for validation). High-risk CT findings (n = 1,178), identified from radiology reports describing features highly suggestive of lung cancer and corresponding to an estimated malignancy probability ≥85%, were used exclusively as cases for model training, while biopsy-confirmed lung cancer cases (n = 141) were reserved as the only positive cases in the independent test set for final model evaluation. A ridge regression model was trained using selected CBC-derived features. Model performance was additionally evaluated in a predefined subgroup of 1,267 individuals with documented smoking status to assess performance in a high-risk population. Results: Several CBC parameters showed significant differences between high-risk CT cases and low-risk controls, including neutrophil count and RDW (p < 0.001 for both). Following feature selection, MCV, neutrophil count, and RDW were retained in the final model, which achieved an AUC of 0.71 (95% CI: 0.70–0.71). Model discrimination remained stable across bootstrap resampling. In the subgroup analysis restricted to smokers, model performance remained comparable to that observed in the overall population with an AUC of 0.68 (95% CI: 0.67-0.68) , indicating consistent discrimination in this high-risk group. Conclusions: A machine learning model based on routinely available CBC parameters demonstrated potential as a scalable and low-cost lung cancer risk stratification tool. This approach may help prioritize individuals for CT-based screening, particularly in settings with limited access to LDCT or when smoking history is unavailable. External validation is required to confirm generalizability and clinical utility.
Araújo et al. (Thu,) studied this question.