Model transferability is essential for predicting species distribution in novel regions or time periods. However, assessing model transferability remains a major knowledge gap. The aim of this study was to identify an appropriate method for assessing the transferability of species distribution models (SDMs) in validation, specifically focusing on the types of test datasets and evaluators. The model evaluation ability of validation methods was examined using several completely independent datasets that are rarely used in other studies. The distribution of three invasive plant species was predicted using Maxent across different continents with datasets sourced from various continental combinations. The evaluation ability for testing on internal random holdout datasets was examined by comparing the evaluator values from internal cross-validations and aforementioned predictions. Subsequently, the paired distribution predictions from identical models in different regions were compared to examine the evaluation ability for testing on spatially independent datasets. The relationship between the paired predictions mirrored that between the predictions using the test datasets and predictions in the target regions. Three evaluators were compared: the area under the receiver operating characteristic curve (AUC), continuous Boyce index (CBI), and correlation coefficient with the predictions from the model calibrated in the predicted region (RWIP). Cross-validation using random holdout datasets consistently rated all models as good; however, the prediction evaluations in the target regions varied widely. Therefore, conventional cross-validation proved inadequate for assessing model transferability. Analyzing the paired distribution predictions from the same models across different regions revealed that using the AUC and CBI increased the evaluation uncertainty, whereas applying the RWIP maintained relatively low evaluation uncertainty. This study confirms that the conventional approach, namely using random holdout test datasets and the AUC, fails to reliably assess model transferability. Instead, using spatially independent test datasets and the RWIP provides a more robust evaluation approach.
Takayuki Matsui (Wed,) studied this question.