Forecasting weekly dengue incidence is a critical challenge for public health systems in endemic regions. This work introduces DENGCAST, a structured and reproducible machine learning pipeline for dengue forecasting using gradient boosting. Unlike prior work that emphasizes model complexity, DENGCAST prioritizes methodological rigor through strict temporal validation and biologically grounded feature engineering. The approach is evaluated on the DengAI benchmark dataset (San Juan, Puerto Rico and Iquitos, Peru) using five-fold chronological cross-validation to eliminate temporal leakage. The model integrates CatBoost with autoregressive lag features and rolling statistics derived strictly from past observations. DENGCAST achieves a mean absolute error (MAE) of 11.87 for San Juan and 4.74 for Iquitos, representing improvements of 53.2% and 29.6% respectively over climate only baselines. A systematic ablation study demonstrates that a single one week lag feature contributes approximately 55% of total performance gain, providing strong evidence that short-term outbreak momentum is the dominant predictive signal. The results highlight that careful validation design and feature engineering can outperform more complex deep learning approaches in low-data epidemiological settings.
Hardik Thapar (Fri,) studied this question.