Abstract Accurate short‐term forecasting of clouds from satellite imagery is a foundational technology for downstream meteorological disaster mitigation and aviation safety enhancement systems, particularly in developing countries and remote areas lacking ground‐based observation infrastructure. Existing deep learning models often produce blurry results and exhibit reduced accuracy when forecasting atmospheric variables, while recent advances in video prediction show the potential to solve these problems. Here, we introduce SATcast, a diffusion‐based model that employs a cascaded architecture and multimodal inputs for forecasting cloud evolution from satellite imagery. SATcast incorporates physical fields predicted by FuXi, a deep‐learning weather forecasting model, alongside historical satellite observations as conditional inputs to generate high‐quality future cloud fields. Comprehensive evaluations demonstrate that SATcast consistently outperforms conventional methods across multiple metrics such as fractions skill score, achieving superior accuracy and robustness. Ablation studies underscore the importance of its multimodal design and cascade architecture in enhancing predictive performance. Notably, SATcast maintains skillful predictions for up to 24 hr, underscoring its potential for operational forecasting applications.
Chen et al. (Sun,) studied this question.