Abstract Accurate and timely information on the spatial distribution of crops is essential for ensuring food security, achieving sustainable agricultural management, and understanding ecosystem interactions. However, in large-scale arid regions like Xinjiang, China, constructing high-spatial-resolution, continuous, and multi-year crop distribution datasets remains a significant challenge due to complex terrain, sparse ground observations, and limited computational resources. In this study, we developed a robust crop classification framework leveraging the Google Earth Engine (GEE) cloud platform. The framework integrates all available NASA–Sentinel-2 (HLSL30) imagery to construct harmonic models based on NDVI and LSWI indices, effectively characterizing crop phenological trajectories. These features are combined with a Random Forest (RF) algorithm to achieve detailed identification of major crop types. To minimize interference from non-crop vegetation and background land cover, we implemented a pre-extracted cropland mask. Using this approach, we generated a 30 m resolution dataset of major crops (including cotton, maize, wheat, and rice) across Xinjiang for the period 2013–2024. Accuracy assessments using independent validation samples from 2018 and 2019 yielded producer accuracies of 0.83–0.99 and user accuracies of 0.83–0.96. The overall accuracy reached 0.90 and 0.93, with Kappa coefficients of 0.86 and 0.89, respectively. Furthermore, the estimated crop areas at the prefecture level show high consistency with official statistical yearbooks and align well with existing distribution maps of cotton, maize, and wheat. This dataset provides a systematic characterization of the long-term spatial dynamics of major crops in Xinjiang, offering critical and reliable data support for regional agricultural monitoring, food security assessment, policy formulation, and environmental change research.
Liang et al. (Mon,) studied this question.