Low-cost air quality sensors enable dense urban monitoring networks but require calibration against reference-grade instruments. While machine learning calibration is well-established for co-located sensor pairs, applying these calibrations to sensors deployed far from any reference station—the operational reality for most network sensors—lacks systematic methodology. We address this gap using 24 months of hourly data (August 2023–July 2025) from Sofia, Bulgaria, where five official reference stations (Executive Environmental Agency) operate alongside 22 AirThings low-cost sensors, four of which are co-located. Random Forest models achieved R2∈(0.53,0.75) across PM2.5, PM10, NO2, and O3, representing from 40% (for O3) to 408% (for PM2.5) improvement over Multiple Linear Regression baselines. Using leave-one-station-out spatial cross-validation, we derived pollutant-specific uncertainty growth rates (α) from 3.84% to 5.62% per km, characterizing how calibration uncertainty increases with distance from reference stations (statistically significant for PM10 and O3, p<0.05). Applied to 18 non-co-located sensors, the framework generated 1.2 million calibrated hourly measurements with 95% prediction intervals over the study period. Co-location sites spaced 6 km apart achieve a less than 30% uncertainty increase at network midpoints, within EU Air Quality Directive thresholds for indicative monitoring. These empirically derived α parameters enable network planners to predict measurement reliability at arbitrary sensor locations without ground-truth validation, providing evidence-based guidance for cost-effective hybrid monitoring network design.
Zhivkov et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: