Rapid environmental change has increased the need for predicting the long-term geospatial reliably. However, accurately modeling spatio-temporal geospatial dynamics remains challenging Because of the nonlinearities, complex spatial dependency, and external driving factors, it is difficult to predict. In this paper, a comprehensive benchmarking framework is proposed for the comparison of neighborhood-based, graph-based and attention-based spatiotemporal deep learning models, with the same preprocessing, training and testing procedure.. Long Short-Term Memory (LSTM) models with and without auxiliary variables are compared with hybrid Graph Attention Network–LSTM (GAT–LSTM) models and fully attention-based GAT–Temporal Attention models, with and without a feed-forward (MLP/FFN) block. All models are trained using a unified preprocessing and evaluation pipeline on annual satellite data in Network Common Data Form (NetCDF) from 2000 to 2023, with 2024 reserved as a fully unseen test dataset. Global pixel-wise measures such as R 2 , RMSE, MAE, MAPE, and correlation are used to evaluate model performance based on performance of vectors and alignment of predicted vectors and reference vectors. . Findings indicate that the LSTM–CA with auxiliary inputs (3 × 3 neighborhood) performs the best and most stable performance (R 2 ≈ 0.95), highlighting the importance of the integrated Cellular Automata (CA) structure and auxiliary driving factors. The GAT–Temporal Attention model with an MLP block ranks second, while removing the MLP or using hybrid LSTM–GAT configurations lead to unstable or degraded performance. Index-wise analysis shows that vegetation and water-related indices are more predictable. The results indicate that strong temporal modeling of information combined with auxiliary information is more important than complexity of spatial attention. The main novelty of this paper is that it does not introduce a new model for a neural network, instead it proposes a comparative engineering experiment to assess the conditions where neighborhood-based temporal models could be superior to graph-attention models in geospatial long-range prediction applications.
Karimadini et al. (Fri,) studied this question.