What question did this study set out to answer?

The research aims to develop a framework that predicts short-term cloud movements using sky camera images for effective solar energy forecasting.

April 12, 2026Open Access

Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Key Points

The research aims to develop a framework that predicts short-term cloud movements using sky camera images for effective solar energy forecasting.
Utilizes a convolutional neural network for cloud region segmentation and probabilistic mask creation.
Applies a vector quantized variational autoencoder to compress masks into latent token sequences.
Employs a GPT-style autoregressive transformer to learn temporal dependencies and predict future cloud movements.
Achieves an average intersection-over-union ratio of 0.92 for single-step predictions.
Maintains a pixel accuracy of 0.96 for short-term forecasting.
Observes a decrease in performance to an intersection-over-union ratio of 0.65 and accuracy of 0.80 after 10 minutes.

Abstract

Cloud cover significantly reduces the electrical power output of photovoltaic systems, making accurate short-term cloud movement predictions essential for reliable solar energy production planning. This article presents a deep learning framework that directly estimates cloud movement from ground-based all-sky camera images, rather than predicting future production from past power data. The system is based on a three-step process: First, a lightweight Convolutional Neural Network segments cloud regions and produces probabilistic masks that represent the spatial distribution of clouds in a compact and computationally efficient manner. This allows subsequent models to focus on the geometry of clouds rather than irrelevant visual features such as illumination changes. Second, a Vector Quantized Variational Autoencoder compresses these masks into discrete latent token sequences, reducing dimensionality while preserving fundamental cloud structure patterns. Third, a GPT-style autoregressive transformer learns temporal dependencies in this token space and predicts future sequences based on past observations, enabling iterative multi-step predictions, where each prediction serves as the input for subsequent time steps. Our evaluations show an average intersection-over-union ratio of 0.92 and a pixel accuracy of 0.96 for single-step (5 s ahead) predictions, while performance smoothly decreases to an intersection-over-union ratio of 0.65 and an accuracy of 0.80 in 10 min autoregressive propagation. The framework also provides prediction uncertainty estimates through token-level entropy measurement, which shows positive correlation with prediction error and serves as a confidence indicator for downstream decision-making in solar energy forecasting applications.

Probabilistic Short-Term Sky Image Forecasting Using VQ-VAE and Transformer Models on Sky Camera Data

Key Points

Abstract

Cite This Study