Exposure to fine particulate matter (PM2.5) in ambient air is recognized as the leading environmental risk factor for mortality. A more comprehensive characterization of its chemical composition is needed for its management and health effects research. We improve estimates of total PM2.5 mass concentration and its chemical composition across North America by developing, optimizing, and applying convolutional neural networks (CNN) with information from satellite-, simulation-, and monitor-based sources to estimate the local bias in monthly geophysical a priori PM2.5 and component concentrations over 2000-2023. Significant long-term agreement is found with traditional 10-fold spatial cross-validation for total PM2.5 (R 2 = 0.82), sulfate (R 2 = 0.98), nitrate (R 2 = 0.93), ammonium (R 2 = 0.94), organic matter (R 2 = 0.83), black carbon (R 2 = 0.78), dust (R 2 = 0.71), and seasalt (R 2 = 0.37). We introduce Buffered Leave Isolated Sites and Clusters Out (BLISCO) spatial cross-validation to evaluate the model extrapolation ability over remote regions, and find that traditional spatial cross-validation may overestimate performance and underrepresent uncertainty due to the spatial autocorrelation of ground monitors. The use of geophysical information from a chemical transport model (GEOS-Chem) significantly increases CNN performance in BLISCO cross-validation, for example, increasing R 2 for NO3 - (0.51 to 0.81) and NH4 + (0.27 to 0.67). We represent spatial uncertainty for PM2.5 and its components based on the statistical results of BLISCO cross-validation by integrating information from both the spatial distribution of ground observations and the variability in predictors space representation, and find that distance from monitor is a key predictor of uncertainty.
Shen et al. (Mon,) studied this question.