The past 30 years have witnessed a surge in the number of statistical downscaling techniques and applications. However, an absence of standardized approaches across studies has resulted in a bewildering array of methods that likely obstruct the effective use of downscaling in climate risk management. We address these challenges by demonstrating a transparent workflow for benchmarking downscaling methods. This incorporates a protocol for outlining the reference model, calibration criteria, skill diagnostics, and assessment metrics. When downscaling daily rainfall series and extremes in northern Serbia, we find that an automated calibration of our chosen benchmark model (SDSM) generally outperforms manual calibration for skill diagnostics encompassing rainfall occurrence, variability, and extremes. Additionally, we assess the added value of machine learning (ML) methods relative to the same benchmark. Our findings reveal superior performance of these advanced techniques when downscaling extreme rainfall, but less for rainfall occurrence when compared to the benchmark. Overall, the ML downscaling “won” 42% of our diagnostic tests, the automated SDSM 33% tests, and manually calibrated SDSM ranked first for 25% of the tests. This means that the ML methods do add value relative to the benchmark model (here, SDSM). These findings underscore the utility of our workflow, which also enabled us to identify specific avenues for enhancing the tested ML models.
Wilby et al. (Wed,) studied this question.