March 3, 2026Open Access

Air particulate matter (PM2.5) concentration prediction in Kuwait using vision transformer

Key Points

The model achieved low mean absolute error values of 0.0191, indicating high forecasting accuracy for PM2.5 concentrations.
Air quality data from Kuwait was analyzed using deep learning techniques, including a vision transformer and BiLSTM model.
Dimensionality reduction was performed using principal component analysis to streamline data from ten to four features for improved processing.
Model generalizability was confirmed through testing against datasets from Northern Ireland and Beijing, showcasing enhanced prediction strength.

Abstract

Globally, air pollution has become one of the most concerning problems that affects the quality of human life. In Kuwait, one of the most prominent air pollutants is particulate matter 2.5 (PM2.5), which is a mixture of many inhalable particles that can cause various health issues. Predicting PM2.5 concentration levels will hopefully urge the decision makers to start taking actions against this concerning problem. Since deep learning has proven its efficiency in forecasting tasks, this study aims to develop a robust deep learning model for forecasting PM2.5 future concentrations in Kuwait. We utilized Kuwait air quality and weather data spanning the years 2017 to 2024. The time-series data was initially processed by handling missing values using three distinct data imputation techniques including Self-Attention-based Imputation for Time-Series (SAITS), Conditional Score-based Diffusion model for Imputation (CSDI), and linear interpolation. Furthermore, dimensionality reduction was implemented on the imputed datasets using principal component analysis (PCA), reducing the number of features from 10 to 4. Each of the imputed datasets was presented in time domain, frequency domain, and as principal components. Two data visualization techniques were employed including line graph images, to represent time domain data and principal components, and spectrograms that represent frequency domain data. Images typically possess rich visual patterns, allowing for extracting valuable features and hence, enhancing forecasting results. These images were then fed to a hybrid model that consists of a vision transformer (ViT) and Bidirectional Long Short-Term Memory (BiLSTM), where ViT was used to extract meaningful features from the input images and BiLSTM served to process these features sequentially and forecast PM2.5 concentration levels. The results revealed outstanding performance of our model, achieving the lowest Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) values of 0.0191, 0.0031, and 0.0562, respectively. To assess the generalizability of our model, it was tested against two open-source datasets collected in Northern Ireland and Beijing in China. Evaluation results demonstrated significant reductions in the error-based metrics when compared with previous work.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Alsholi et al. (Fri,) studied this question.

synapsesocial.com/papers/69a767b0badf0bb9e87e1f39 https://doi.org/https://doi.org/10.1186/s40537-026-01364-1

Bookmark

View Full Paper