What question did this study set out to answer?

This research aims to develop a machine learning model for differentiating hematologic disorders using CBC data.

March 14, 2026Open Access

A complete blood count-based machine learning model for rapid differentiation of aplastic anemia, immune thrombocytopenia, and myelodysplastic syndromes in routine clinical practice

Key Points

This research aims to develop a machine learning model for differentiating hematologic disorders using CBC data.
Retrospective analysis of 165,181 blood test records from October 2011 to June 2025.
Included 4,056 confirmed diagnoses for model development and validation.
Utilized routine CBC parameters for model creation and assessed performance with ROC curves and AUC.
LightGBM model achieved AUCs of 0.920 for aplastic anemia, 0.970 for ITP, and 0.788 for MDS.
Overall model accuracy was 0.82, with significant precision and recall for AA and ITP.
SHAP analysis identified key predictors: platelet count, red blood cell count, and white blood cell count.

Abstract

Accurate differentiation of common hematologic disorders remains challenging in routine clinical practice and often requires invasive diagnostic procedures. Although complete blood count (CBC) testing is widely available, its diagnostic value for early disease triage has not been fully understood. Retrospectively, among 165,181 routine blood test records collected between October 2011 and June 2025, 4,056 samples with confirmed diagnoses were included for model development and validation after exclusion of cases lacking definitive diagnostic information. Patients were classified into aplastic anemia (AA), immune thrombocytopenia (ITP), myelodysplastic syndrome (MDS), and other hematologic conditions. Machine learning models were developed using routinely available CBC parameters. Model performance was assessed using one-vs-rest receiver operating characteristic (ROC) curves, area under the curve (AUC), and class-specific precision, recall, and F1-scores. Model interpretability was evaluated using Shapley Additive exPlanations (SHAP). Baseline demographic and hematologic parameters differed significantly among diagnostic groups (all P < 0.001). Among the evaluated models, LightGBM demonstrated robust overall performance, achieving one-vs-rest AUCs of 0.920 for AA, 0.970 for ITP, 0.788 for MDS, and 0.870 for other conditions, with an overall accuracy of 0.82. While AA and ITP were identified with favorable precision and recall, MDS showed lower recall, reflecting substantial overlap in routine laboratory features. SHAP analysis identified platelet count, red blood cell count, and white blood cell count as the most influential predictors. A machine learning model based on routinely available CBC parameters can support non-invasive differentiation of common hematologic disorders. This approach may serve as a practical screening and triage tool at the outpatient or pre-bone marrow stage, helping optimize the use of invasive diagnostic procedures.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Wang et al. (Sun,) studied this question.

synapsesocial.com/papers/69b4ba1818185d8a3980293f https://doi.org/https://doi.org/10.1016/j.plabm.2026.e00526

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark

View Full Paper