What question did this study set out to answer?

The aim is to develop and evaluate tree-based machine learning models for predicting air pollutants.

February 27, 2026Open Access

Applicability analysis of tree-based ensemble learning for air pollutant prediction models

Key Points

The aim is to develop and evaluate tree-based machine learning models for predicting air pollutants.
Developed a machine learning framework for multi-pollutant forecasting.
Evaluated Random Forest, Gradient Boosting Decision Tree, and Decision Tree models.
Constructed a feature system incorporating meteorology-emission interactions.
Utilized SHAP values to measure feature contributions.
Random Forest achieved optimal PM2.5 prediction (R2 = 0.99, RMSE = 0.11 µg/m3).
Gradient Boosting Decision Tree showed comparable accuracy to Random Forest for NO2 and CO.
Decision Tree demonstrated competitive performance in predicting O3 (R2 = 0.88).
SHAP analysis indicated CO positively influences PM2.5 predictions.

Abstract

To support coordinated air quality management, this study developed a tree-based machine learning framework for multi-pollutant forecasting. We systematically evaluated the predictive performance of Random Forest (RF), Gradient Boosting Decision Tree (GBDT), and Decision Tree (DT) models for six key pollutants: PM2.5, PM10, NO2, SO2, CO, and O3, using high-resolution environmental monitoring data (10 km resolution) from China’s four major municipalities (2021–2024). A comprehensive feature system was constructed incorporating meteorology-emission interaction terms. SHapley Additive exPlanations (SHAP) values were employed to quantify feature contributions. Key findings demonstrate: (1) RF achieved optimal performance in particulate matter prediction (PM2.5: R2 = 0.99, RMSE = 0.11 µg/m3; PM10: R2 = 0.98); (2) GBDT showed comparable accuracy to RF for NO2 (R2 = 0.85) and CO (R2 = 0.98) with minimal differences (ΔR2 ≤ 0.03); (3) DT exhibited competitive O3 prediction capability (R2 = 0.88). SHAP analysis revealed critical mechanisms, such as CO’s positive synergistic effect (SHAP = 0.136) in PM2.5 prediction and O3 generation sensitivity to temperature (SHAP = 0.076). This research provides an interpretable, multi-pollutant forecasting framework applicable to urban air quality warning systems and offers model selection guidance for environmental regulation strategies.

Bookmark

View Full Paper

Cite This Study

Zhu et al. (Wed,) studied this question.

synapsesocial.com/papers/69a134fbed1d949a99abe66e https://doi.org/https://doi.org/10.1038/s41598-025-32652-0

Bookmark

View Full Paper