What question did this study set out to answer?

The study aims to improve air quality predictions using machine learning algorithms and explainable AI techniques.

April 25, 2026

Air Quality Prediction Using Machine Learning and Statistical Analysis: An Explainable AI Perspective

FPFarida Siddiqi PrityKona Medical (United States)IAIftikhar ArefinShanto-Mariam University of Creative Technology MRMirza RaquibInternational Islamic University Chittagong

Key Points

The study aims to improve air quality predictions using machine learning algorithms and explainable AI techniques.
Utilized five machine learning algorithms: SVM, KNN, GNB, RF, and XGBoost to classify AQI levels.
Employed 10-fold cross-validation to validate model robustness and applied T-test for feature significance analysis.
Used Explainable AI methods (LIME and SHAP) to interpret model decisions and identify critical air pollutants.
RF achieved 99% accuracy, precision, recall, and F1-score, with low inference time (42 s).
PM2.5, PM10, NO2, and NOx were identified as key features influencing AQI predictions through XAI.
Paired t-test confirmed the statistical significance of RF's superior performance over other models.

Abstract

ABSTRACT Air pollution poses a significant threat to both environmental and public health, underscoring the need for automated and accurate air quality monitoring systems. This study utilizes five Machine Learning (ML) algorithms—Support Vector Machine (SVM), K‐Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) —to classify Air Quality Index (AQI) levels. The analysis is based on a comprehensive dataset comprising 29, 531 records and 16 pollutant‐related features, including City, Date, PM2. 5, PM10, NO, NO 2, NOx, NH 3, CO, SO 2, O 3, Benzene, Toluene, Xylene, AQI, and AQIBucket. Among the models, RF demonstrated superior performance, achieving 99% accuracy, precision, recall, and F1‐score, with low inference time (42 s) and minimal memory usage (17 MB). To enhance both accuracy and interpretability, two feature selection approaches were employed. A T‐test identified statistically significant differences in pollutant concentrations between high and low AQI groups. In parallel, Explainable AI (XAI) methods—Local Interpretable Model‐Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) —were applied to interpret the models’ decision‐making processes. These techniques consistently highlighted PM2. 5, PM10, NO 2, and NO x as the most critical features influencing AQI predictions. Model robustness was validated using 10‐fold cross‐validation, while a paired t ‐test confirmed the statistical significance of RF's superior performance. The integrated approach not only achieves high classification accuracy but also provides meaningful insights into the factors driving air pollution, thus supporting more transparent and reliable air quality assessment systems.

Ask AI

Helpful

Bookmark

View Full Paper