Cloud detection is essential for optical remote sensing data preprocessing. However, hyperspectral cloud detection datasets remain scarce, suffering from issues such as limited spectral coverage, small annotation scales, and a lack of scene diversity, which hinders the development of hyperspectral cloud detection algorithms. To address this, this paper constructs CloudAHSI—a multi-source hyperspectral cloud detection dataset for global complex scenes—based on the Advanced Hyperspectral Imager (AHSI) aboard the GF-5 01 satellite. The dataset comprises 45 original scenes and enhanced sub-scenes, achieving full-spectrum coverage from 400 to 2500 nm. Through a semi-supervised annotation framework combining “spectral prior-based rough labeling and manual refinement,” the dataset provides pixel-level labels for thick clouds, thin clouds, and non-cloud areas, with scenes further categorized by cloud coverage and primary land cover types. Experiments demonstrate that CloudAHSI effectively supports deep learning models in cloud detection tasks over complex surface backgrounds, particularly showing significant data value in the detection and evaluation of thin clouds, thereby meeting multi-level cloud detection requirements ranging from pixel segmentation to scene understanding. The release of this dataset provides a critical data foundation for overcoming spectral confusion bottlenecks in hyperspectral cloud detection and advancing the utilization of full-spectrum remote sensing information.
Jia et al. (Wed,) studied this question.