Accurate prediction of algal blooms is often hindered by the scarcity of high-frequency water quality data, as field monitoring typically provides only discontinuous and sparse measurements. While machine learning (ML) models require large training data sets and process-based models demand extensive parametrization, we develop a hybrid framework that leverages the complementary strengths of both to provide a practical decision support framework. Using a Random Forest algorithm to identify key algal bloom drivers from sparse monthly observations in the Lam Tsuen River, Hong Kong, we then reconstruct physically consistent, daily time-series for these drivers by Soil and Water Assessment Tool (SWAT). An ML model trained solely on these SWAT-reconstructed inputs achieves reliable chlorophyll-a predictions (test R2 = 0.58, Kling-Gupta Efficiency = 0.56, and root-mean-square error = 0.109 μg/L), demonstrating that accurate daily predictions can be obtained with a minimal set of variables. This study presents a parsimonious, transferable workflow that transforms limited monitoring data into an operational prediction tool, enabling cost-effective algal bloom management in data-limited watersheds.
Xu et al. (Mon,) studied this question.