Key points are not available for this paper at this time.
Abstract Accurate forecasting of renewable energy generation is vital for ensuring grid stability, yet most existing studies focus primarily on improving model architectures rather than addressing the quality of the input data. This creates a significant gap between theoretical model performance and real-world forecasting reliability. Renewable energy datasets, such as those from wind, solar, and biogas systems, often suffer from incomplete timestamps, inconsistent power readings, and sensor-induced anomalies. These issues distort temporal structure and reduce predictive accuracy, especially in sequential deep learning models. To address this challenge, we propose a unified preprocessing framework that systematically repairs, aligns, and smooths renewable energy data before model training. To systematically evaluate the impact of data quality, three experimental scenarios are considered: raw data, preprocessed data, and feature-reduced data using principal component analysis (PCA). The framework is evaluated using CNN, GRU, and LSTM models under identical conditions. Results show that preprocessing reduces MAE by up to 70% in solar and 35% in wind datasets, while improving convergence stability across all models. Feature reduction further enhances performance in high-dimensional datasets such as wind and solar, but leads to degradation in low-sample scenarios such as biogas, highlighting the data-dependent nature of dimensionality reduction. These findings demonstrate that preprocessing plays a more critical role than model complexity in renewable energy forecasting. The study establishes a data-centric framework that improves reliability, generalization, and interpretability across heterogeneous energy systems.
Sakib et al. (Wed,) studied this question.