What type of study is this?

September 5, 2025Open Access

A Novel Approach for Reliable Classification of Marine Low Cloud Morphologies with Vision–Language Models

Key Points

The vision-language model achieves an overall accuracy of 0.84 for mesoscale cloud types, enhancing cloud classification reliability.
K-fold cross-validation shows that the VLM performs robustly under limited training sizes, making it suitable for diverse cloud datasets.
For marine stratocumulus types, the model attains 0.86 accuracy, significantly outperforming traditional image-only models under similar conditions.
These findings suggest that vision-language models could reshape how satellite remote sensing is applied in climate science.

Abstract

Marine low clouds have a strong impact on Earth’s system but remain a major source of uncertainty in anthropogenic radiative forcing simulated by general circulation models. This uncertainty arises from incomplete understanding of the many processes controlling their evolution and interactions. A key feature of these clouds is their diverse mesoscale morphologies, which are closely tied to their microphysical and radiative properties but remain difficult to characterize with satellite retrievals and numerical models. Here, we develop and apply a vision–language model (VLM) to classify marine low cloud morphologies using two independent datasets based on Moderate Resolution Imaging Spectroradiometer (MODIS) satellite imagery: (1) mesoscale cellular convection types of sugar, gravel, fish, and flower (SGFF; 8800 total samples), and (2) marine stratocumulus (Sc) types of stratus, closed cells, open cells, and other cells (260 total samples). By conditioning frozen image encoders on descriptive prompts, the VLM leverages multimodal priors learned from large-scale image–text training, making it less sensitive to limited sample size. Results show that the k-fold cross-validation of VLM achieves an overall accuracy of 0.84 for SGFF, comparable to prior deep learning benchmarks for the same cloud types, and retains robust performance under the reduction of SGFF training size. For the Sc dataset, the VLM attains 0.86 accuracy, whereas image-only model is unreliable under such limited training set. These findings highlight the potential of VLMs as efficient and accurate tools for cloud classification under very low samples, offering new opportunities for satellite remote sensing and climate model evaluation.

A Novel Approach for Reliable Classification of Marine Low Cloud Morphologies with Vision–Language Models

Key Points

Abstract

Cite This Study