What question did this study set out to answer?

The aim is to develop a systematic methodology for calibrating low-cost air quality sensors deployed far from reference stations.

March 28, 2026Open Access

Machine Learning Calibration Transfer for Low-Cost Air Quality Sensors: Distance-Based Uncertainty Quantification in a Hybrid Urban Monitoring Network

Puntos clave

The aim is to develop a systematic methodology for calibrating low-cost air quality sensors deployed far from reference stations.
Used 24 months of hourly data from Sofia, Bulgaria, involving five reference stations and 22 low-cost sensors.
Applied Random Forest models for calibration, comparing results to Multiple Linear Regression.
Implemented leave-one-station-out spatial cross-validation to analyze uncertainty growth rates.
Random Forest models showed R2 values between 0.53 and 0.75 for key pollutants.
Derived uncertainty growth rates ranged from 3.84% to 5.62% per km.
Achieved under 30% uncertainty increase for co-location sites 6 km apart, consistent with EU monitoring standards.

Resumen

Low-cost air quality sensors enable dense urban monitoring networks but require calibration against reference-grade instruments. While machine learning calibration is well-established for co-located sensor pairs, applying these calibrations to sensors deployed far from any reference station—the operational reality for most network sensors—lacks systematic methodology. We address this gap using 24 months of hourly data (August 2023–July 2025) from Sofia, Bulgaria, where five official reference stations (Executive Environmental Agency) operate alongside 22 AirThings low-cost sensors, four of which are co-located. Random Forest models achieved R2∈(0.53,0.75) across PM2.5, PM10, NO2, and O3, representing from 40% (for O3) to 408% (for PM2.5) improvement over Multiple Linear Regression baselines. Using leave-one-station-out spatial cross-validation, we derived pollutant-specific uncertainty growth rates (α) from 3.84% to 5.62% per km, characterizing how calibration uncertainty increases with distance from reference stations (statistically significant for PM10 and O3, p<0.05). Applied to 18 non-co-located sensors, the framework generated 1.2 million calibrated hourly measurements with 95% prediction intervals over the study period. Co-location sites spaced 6 km apart achieve a less than 30% uncertainty increase at network midpoints, within EU Air Quality Directive thresholds for indicative monitoring. These empirically derived α parameters enable network planners to predict measurement reliability at arbitrary sensor locations without ground-truth validation, providing evidence-based guidance for cost-effective hybrid monitoring network design.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Zhivkov et al. (Thu,) studied this question.

synapsesocial.com/papers/69c771198bbfbc51511e0f40 https://doi.org/https://doi.org/10.3390/atmos17040335

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo