Modern methods for wildfire danger prediction are critical for mitigating the detrimental impacts of fires on ecosystems, public health, and the economy. While Machine Learning has emerged as a powerful approach to model the complex interactions driving wildfire risk, its ‘black-box’ nature creates a trade-off between predictive skill and physical plausibility and interpretability required for trustworthy risk assessments. In this study, we systematically assess the predictive performance and physical consistency of seven temporal deep learning (DL) models against two decision tree-based baselines, random forest (RF) and XGBoost (XGB), for next-day wildfire danger prediction in the Mediterranean. We apply explainable AI (xAI) methods to interpret model attributions and assess their broad alignment with established fire science. Results show that all DL models outperform RF and XGB baselines, with Transformer models achieving the highest predictive accuracy (F₁-score 0. 81), significantly outperforming the RF baseline (post-hoc Dunn test, p < 10^-5) and by effectively capturing long-range temporal dependencies. However, xAI analyses reveal a key trade-off: despite their higher predictive performance, DL models exhibit lower physical consistency in their averaged driver relationships. Specifically, when evaluated against 19 expected fire-driver relationships, the RF and XGB correctly capture 13 (12) relationships, whereas DL models capture at most 11. We further investigate how Transformers generated individual wildfire danger predictions through case studies of two similar large fire events in Spain, one correctly predicted (true positive) and one missed (false negative). The analysis demonstrates how differences in driver representation can lead to divergent predictions, such as correctly identifying a heatwave-driven event but missing a lightning-induced ignition. Together, these investigations provide a structured evaluation of a wide range of DL models in terms of their predictive accuracy and physical consistency, offering guidance for future wildfire danger forecasting in fire-prone regions, such as the Mediterranean.
Becker et al. (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: