With the exacerbation of the biodiversity and climate crises, macroecological pursuits such as global biodiver- sity mapping become more urgent. Remote sensing offers a wealth of Earth observation data for ecological studies, but the scarcity of labeled datasets remains a major chal- lenge. Recently, self-supervised learning has enabled learn- ing representations from unlabeled data, triggering the de- velopment of pretrained geospatial models with generaliz- able features. However, these models are often trained on datasets biased toward areas of high human activity, leav- ing entire ecological regions underrepresented. Addition- ally, while some datasets attempt to address seasonality through multi-date imagery, they typically follow calendar seasons rather than local phenological cycles. To better capture vegetation seasonality at a global scale, we propose a simple phenology-informed sampling strategy and introduce corresponding SSL4Eco, a multi-date Sentinel-2 dataset, on which we train an existing model with a season- contrastive objective. We compare representations learned from SSL4Eco against other datasets on diverse ecologi- cal downstream tasks and demonstrate that our straight- forward sampling method consistently improves represen- tation quality, highlighting the importance of dataset con- struction. The model pretrained on SSL4Eco reaches state of the art performance on 7 out of 8 downstream tasks span- ning (multi-label) classification and regression. We release our code, data, and model weights to support macroecolog- ical and computer vision research at https://github.com/PlekhanovaElena/ssl4eco.
Plekhanova et al. (Wed,) studied this question.