What question did this study set out to answer?

The research aims to enhance dengue incidence forecasting using machine learning and rigorous validation methods.

March 29, 2026Open Access

DENGCAST: Temporal Feature Engineering for Dengue Forecasting with Gradient Boosting

Key Points

The research aims to enhance dengue incidence forecasting using machine learning and rigorous validation methods.
Development of DENGCAST, a structured machine learning pipeline for dengue forecasting.
Application of gradient boosting, specifically CatBoost, for modeling dengue incidence.
Implementation of five-fold chronological cross-validation to ensure no temporal leakage.
Integration of autoregressive lag features and rolling statistics derived from past observations.
Conducting a systematic ablation study to identify key performance drivers.
DENGCAST achieved a mean absolute error (MAE) of 11.87 for San Juan and 4.74 for Iquitos, improving accuracy by 53.2% and 29.6% over baseline models.
Identification of a single one-week lag feature contributing approximately 55% to overall performance gains.
Demonstration that methodological rigor in validation and feature engineering outperforms complex models in low-data settings.

Abstract

Forecasting weekly dengue incidence is a critical challenge for public health systems in endemic regions. This work introduces DENGCAST, a structured and reproducible machine learning pipeline for dengue forecasting using gradient boosting. Unlike prior work that emphasizes model complexity, DENGCAST prioritizes methodological rigor through strict temporal validation and biologically grounded feature engineering. The approach is evaluated on the DengAI benchmark dataset (San Juan, Puerto Rico and Iquitos, Peru) using five-fold chronological cross-validation to eliminate temporal leakage. The model integrates CatBoost with autoregressive lag features and rolling statistics derived strictly from past observations. DENGCAST achieves a mean absolute error (MAE) of 11.87 for San Juan and 4.74 for Iquitos, representing improvements of 53.2% and 29.6% respectively over climate only baselines. A systematic ablation study demonstrates that a single one week lag feature contributes approximately 55% of total performance gain, providing strong evidence that short-term outbreak momentum is the dominant predictive signal. The results highlight that careful validation design and feature engineering can outperform more complex deep learning approaches in low-data epidemiological settings.

DENGCAST: Temporal Feature Engineering for Dengue Forecasting with Gradient Boosting

Key Points

Abstract

Cite This Study