In this study, we developed a leakage-free time-series machine learning framework to improve the accuracy of short-term (10 min ahead) wind speed forecasting. The measurements were obtained from real operational data collected at the Bandırma/Balıkesir wind power plant in Türkiye. The framework incorporates chronological train validation test splitting, causal missing data imputation, leakage-free feature engineering, and supervised lag-based modeling. Such a leak-proof design is crucial to avoid future information influencing the training and testing process of models, thus making the forecasting process more realistic and reliable in practice. We tested several models, including persistence, Support Vector Regression (SVR), Least-Squares Gradient Boosting (LSBoost), Random Forest (RF), Elastic Net (ELASTIC), and a stacking ensemble, and evaluated their performance using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R-Squared (R2), bias measures, and skill scores, complemented by diagnostic analyses including residual distribution, autocorrelation, regime-based evaluation, Bland–Altman plots, and Quantile Quantile (Q-Q) plots. Our analyses showed that the Elastic Net model achieved balanced and statistically consistent performance, with a test RMSE of 0.6325 m/s, R2 = 0.977, and negligible bias. Residual analysis indicated that errors were centered around zero, exhibited weak temporal dependence, and followed an approximately normal distribution in the central quantiles. Regime-based evaluation revealed that the model performed strongly in medium- and high-wind-speed conditions, while accuracy decreased under low wind speeds due to measurement uncertainty and low signal-to-noise ratios. Feature importance analysis indicated that previous wind speed was the dominant predictor, with solar irradiation and air temperature also contributing significantly. Forecast error decomposition showed that most prediction errors arose from natural atmospheric variability, with minimal systematic bias. The Diebold–Mariano test confirmed that ELASTIC statistically outperformed conventional machine learning models such as SVR and Random Forest. The proposed framework demonstrates statistically consistent short-term forecasting behavior that may support operational wind energy management and grid balancing applications.
Şahin et al. (Tue,) studied this question.