High-resolution data on fine particulate matter (PM 2.5 ) are essential for air quality management and health risk assessments, yet monitoring stations in Thailand remain sparse. This study improves daily PM 2.5 estimates by identifying effective predictors from the Community Multiscale Air Quality (CMAQ) model and reanalysis Aerosol Optical Depth (AOD) using a Light Gradient Boosting Machine framework. Six predictor sets were tested: (1) one baseline CMAQ predictor with all emission sources, (2–4) three source-decoupled CMAQ predictors isolating individual contributions from anthropogenic (AT), biomass burning (BB), and combined (AT-BB) emissions, (5) one AOD from MERRA-2, and (6) a hybrid AT-BB-AOD predictor. Models were evaluated under station-holdout, region-holdout, and sparse-station strategies. Under the station-holdout evaluation, the AT-BB model outperformed the baseline, and performance further improved when combined with AOD, surpassing all other models. The sparse-station evaluation confirmed that decoupled predictors remained effective even under limited training station coverage. Region-holdout results showed that the effectiveness of decoupled predictors depended on the alignment of feature distributions between training and test regions, and the generalizability of learned feature interactions. Findings suggest that decoupled predictors are the most effective when applied within known regions. For out-of-domain applications, the inclusion of training data from regions with diverse emission profiles and meteorological conditions is recommended to fully exploit the benefits of decoupled predictors. The AOD model achieved comparable accuracy to the baseline, serving as a practical alternative when CMAQ simulations are unavailable; however, it showed greater underestimation at high PM 2.5 concentrations. Therefore, CMAQ-based predictors with source decoupling are preferred for high-accuracy PM 2.5 estimation. • High spatiotemporal PM 2.5 maps in Thailand are produced by the CMAQ-LightGBM model. • CMAQ-derived source-decoupled PM 2.5 contributions are used as LightGBM predictors. • Three evaluation strategies were used to assess LightGBM spatial generalizability. • Source-decoupled predictors are most effective when applied within known regions. • Reanalysis AOD offers a practical alternative when CMAQ simulations are unavailable.
Sukprasert et al. (Wed,) studied this question.