What question did this study set out to answer?

The aim is to compare the performance of various machine learning and deep learning models in forecasting tropical cyclone track and intensity.

April 22, 2026Open Access

Comparative Evaluation of Machine Learning and Deep Learning Models for Tropical Cyclone Track and Intensity Forecasting in the North Atlantic Basin

Key Points

The aim is to compare the performance of various machine learning and deep learning models in forecasting tropical cyclone track and intensity.
Comparative evaluation of six models: Random Forest, XGBoost, LightGBM, CatBoost, ANN, and CNN.
Models trained using the NHC's HURDAT2 dataset from 1990 to 2019 and tested on data from the 2020 season.
Performance assessed through mean absolute error and coefficient of determination across multiple lead times.
Several ML and DL models achieved comparable intensity forecasting performance to the 2020 mean official forecasts.
XGBoost and CatBoost slightly outperformed other ML models while LightGBM was the most computationally efficient.
CNNs outperformed ANNs in predictive accuracy for intensity forecasting, while ANNs were more cost-efficient for track forecasts.

Abstract

Accurate forecasts of tropical cyclone (TC) track and intensity with a sufficient lead time are critical for disaster preparedness and risk mitigation. Traditional numerical weather prediction models, while fundamental to operational forecasting, often exhibit systematic errors due to limitations in observations, physical parameterizations, and model resolution. In recent years, machine learning (ML) and deep learning (DL) approaches have emerged as promising data-driven alternatives for improving TC forecasts. This study presents a comparative evaluation of six ML and DL models—Random Forest (RF), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), Artificial Neural Network (ANN), and Convolutional Neural Network (CNN)—for forecasting TC track and intensity in the North Atlantic basin. The models are trained using the National Hurricane Center’s (NHC) HURDAT2 best-track dataset for storms from 1990 to 2019 and evaluated on an independent test set from the 2020 season. Model performance is compared across all models and benchmarked against the 2020 mean Decay-SHIFOR5 intensity error, CLIPER5 track errors, and the NHC official forecast (OFCL) errors. Forecast skill is assessed using mean absolute error (MAE) with 95% bootstrap confidence intervals and the coefficient of determination (R2) across lead times of 6, 12, 18, 24, 48, and 72 h. The results show that: (1) several ML and DL models achieve intensity forecast performance that is broadly comparable in magnitude to the 2020 mean OFCL benchmarks, with an average error reduction of 5–11% at the 24 h lead time; (2) among the ML models, XGBoost and CatBoost slightly outperform LightGBM and RF in accuracy, while LightGBM demonstrates the highest computational efficiency; and (3) among the DL models, CNNs outperform ANNs in predictive accuracy and intensity forecasting efficiency, while ANNs exhibit lower computational cost for track forecast. Bootstrap confidence intervals indicate relatively low variability in model errors, supporting the statistical stability of the results within the 2020 season. However, these results reflect within-season variability and do not necessarily generalize across different years or climatological conditions. Overall, the findings demonstrate the potential of ML/DL-based approaches to complement existing operational forecast systems and enhance TC track and intensity forecasting in the North Atlantic basin.

Comparative Evaluation of Machine Learning and Deep Learning Models for Tropical Cyclone Track and Intensity Forecasting in the North Atlantic Basin

Key Points

Abstract

Cite This Study