• Curated paired arterial blood gas–complete blood count dataset to study spurious hypoxemia in leukocytosis/thrombocytosis. • Harmonized variables, units, and coding to support reproducible secondary analyses. • Applies prespecified quality-control and timing filters to reduce pre-analytical bias. • Enables subgroup and modelling work to flag high-risk samples and improve oxygenation interpretation • Supports benchmarking of oxygenation indices ( p aO₂-derived measures) across cell-count extremes. This data article describes a de-identified dataset comprising 7,473 arterial blood gas (ABG) specimens from 2,647 unique patients at a major medical center, collected between September 2020 and January 2026. Each ABG specimen is paired with concurrent complete blood count (CBC) and basic metabolic panel (BMP) results, facilitating examination of the association between leukocytosis (elevated white blood cell counts) and spurious hypoxemia in blood gas analysis. The dataset includes 71 variables covering patient demographics (age in months, sex, race/ethnicity, anthropometrics), ABG parameters ( p aO₂, sO₂, p aCO₂, pH, FIO₂, temperature), hematologic indices (WBC count, platelets, hemoglobin, red cell indices), metabolic panel values, specimen collection timing verification, and derived variables such as age-specific classifications of leukocytosis and thrombocytosis, along with oxygen saturation discrepancy metrics. Leukocytosis, based on age-adjusted upper reference limits, was observed in 2,579 specimens (35%). Spurious hypoxemia, defined by discrepancies between measured and calculated oxygen saturation, was identified in 177 specimens (2.4%). This resource enables detailed investigation of risk factors for spurious hypoxemia and supports re-evaluation of the longstanding “leukocyte larceny” hypothesis, offering substantial re-use potential for studies in clinical pathology, pulmonology, hematology, critical care medicine, and laboratory medicine. However, some specimens in this dataset should be excluded from further analyses when they do not meet quality control criteria deemed by the investigator (i.e. collection-to-analysis processing times are too long etc…). It is up to the investigator to decide which samples to exclude in this this dataset when performing secondary analyses.
Zavorsky et al. (Sun,) studied this question.