Accurate forecasting of solar photovoltaic (PV) power generation is critical for the efficient integration of renewable energy into modern power grids. This study presents a comparative evaluation of four supervised machine learning regression models — Linear Regression, Random Forest Regressor, Support Vector Regression (SVR), and XGBoost Regressor — for predicting DC power output from a solar PV installation. The dataset was constructed by merging solar generation records with meteorological data using aligned timestamps. Nighttime observations (solar irradiation = 0) were excluded from the analysis. The three principal input features employed were Solar Irradiation, Ambient Temperature, and Module Temperature, while the target variable was DC Power output. Experimental results demonstrate that ensemble-based models — Random Forest and XGBoost — achieve the highest predictive accuracy, attaining an R² of 0.968 with MAE values of 314.52 and 321.32, respectively. Linear Regression yielded moderate performance (R² = 0.956), while SVR exhibited substantially degraded accuracy (R² = 0.779, RMSE = 1828.67), attributed to its sensitivity to hyperparameter selection and high-variance input distributions. Solar irradiation was identified as the dominant predictor, consistent with the established physical relationship between photon flux and PV power conversion. These findings validate the superiority of ensemble methods for solar energy forecasting tasks and provide empirical guidance for model selection in grid-scale renewable energy management systems.
Jain et al. (Mon,) studied this question.