Accurate short-term photovoltaic (PV) power forecasting is essential for grid stability and efficient PV-grid coordination. However, many conventional learning pipelines remain vulnerable to pervasive missing data, irregular sampling, and the absence of calibrated uncertainty estimates. This paper proposes a physics-informed hybrid forecasting framework that couples Extreme Gradient Boosting (XGBoost) for feature-level learning with a Long Short-Term Memory (LSTM) network for residual correction and temporal dependency modeling. To improve robustness under real-world data conditions, the pipeline incorporates irradiance- guided resampling and domain-guided imputation based on PV operational status. Predictive reliability is further enhanced via Monte Carlo ensemble calibration and conformal prediction, enabling probabilistic forecasts and prediction intervals that are assessed using standard calibration metrics (e.g., Prediction Interval Coverage Probability, Continuous Ranked Probability Score). Experiments on the large-scale UNISOLAR dataset (over 2.7 million samples at 15-minute resolution from 42 PV sites worldwide) show that the proposed hybrid model achieves an RMSE of 2 . 57 kW h and an R 2 of 0.934 on held-out test data, corresponding to a 7.05 % reduction in RMSE relative to the next-best baseline (TCN) and a 17.84 % improvement over a standalone LSTM. An ablation study confirms the critical role of historical lag features, whose removal increases RMSE by over 280 %. The framework also provides well-calibrated uncertainty estimates, with conformal prediction achieving 93.0 % coverage at the 95 % confidence level. Computational profiling confirms the hybrid model’s efficiency, requiring 2529 s training time and 3396 MB memory on a free-tier Google Colab CPU, making it suitable for real-time deployment. • A physics-informed hybrid XGBoost–LSTM framework for short-term PV forecasting. • Robust handling of missing and irregular PV data via domain-guided imputation. • Probabilistic forecasting using ensemble calibration and conformal prediction. • 7.05 % RMSE improvement over the next-best baseline on a large multi-site dataset. • Computationally efficient and suitable for real-time deployment.
Yousuf et al. (Fri,) studied this question.