The accurate forecasting of CO2 emissions from power stations is critical for effective climate policy and the transition to sustainable energy systems. However, the complexity of power generation processes and the high dimensionality of operational data present significant challenges to traditional modeling approaches. This paper introduces a novel multi-stage framework that integrates advanced feature selection with explainable machine learning (XAI) to deliver high-accuracy forecasts of power station CO2 emissions while maintaining full model transparency. The proposed methodology comprises a three-stage feature selection process—combining filter, wrapper, and embedded methods—to systematically identify the most influential emission drivers from a large set of potential variables. The selected features are then used to train a suite of machine learning models, including XGBoost, Random Forest, LSTM, and SVR. The best-performing model, XGBoost, achieved a Root Mean Square Error (RMSE) of 28.5, a Mean Absolute Error (MAE) of 19.8, and a coefficient of determination (R2) of 0.96 on a real-world dataset. To address the “black-box” nature of these models, we employ SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) to interpret the model’s predictions, providing granular insights into the key factors driving emissions. The results demonstrate that the proposed framework not only outperforms state-of-the-art forecasting models but also offers a clear, interpretable, and actionable tool for policymakers and plant operators to support CO2 reduction strategies. The novelty of this work lies in its unique combination of a multi-stage feature selection pipeline and a comprehensive XAI-based analysis, providing a robust and transparent solution for a critical environmental challenge.
Qader et al. (Fri,) studied this question.