To support coordinated air quality management, this study developed a tree-based machine learning framework for multi-pollutant forecasting. We systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Decision Tree (DT) models for six key pollutants: PM2.5, PM10, NO2, SO2, CO, and O3, using high-resolution environmental monitoring data (10 km resolution) from China’s four major municipalities (2021–2024). A comprehensive feature system was constructed incorporating meteorology-emission interaction terms. SHapley Additive exPlanations (SHAP) values were employed to quantify feature contributions. Key findings demonstrate: (1) RF achieved optimal performance in particulate matter prediction (PM2.5: R2 = 0.99, RMSE = 0.11 µg/m3; PM10: R2 = 0.98); (2) GBDT showed comparable accuracy to RF for NO2 (R2 = 0.85) and CO (R2 = 0.98) with minimal differences (ΔR2 ≤ 0.03); (3) DT exhibited competitive O3 prediction capability (R2 = 0.88). SHAP analysis revealed critical mechanisms, such as CO’s positive synergistic effect (SHAP = 0.136) in PM2.5 prediction and O3 generation sensitivity to temperature (SHAP = 0.076). This research provides an interpretable, multi-pollutant forecasting framework applicable to urban air quality warning systems and offers model selection guidance for environmental regulation strategies.
Zhu et al. (Wed,) studied this question.