Urban waterlogging has become a critical challenge to urban sustainability under the combined pressures of rapid urbanization and increasingly frequent extreme weather events. However, traditional predictive models struggle to achieve real-time, point-specific early warning effectively, primarily due to the interference of redundant high-dimensional data and the inability to handle severe data imbalance. This study proposes a lightweight and interpretable machine learning framework for real-time waterlogging hotspot prediction, based on a multi-dimensional feature space. Specifically, we implement a Lasso-based mechanism to distill 37 multi-source variables into five core determinants. This process effectively isolates dominant environmental drivers while filtering noise. To further overcome the recall bottleneck, we propose a Synthetic Minority Over-sampling Technique based on Weighted Distance and Cleaning (SMOTE-WDC) algorithm that incorporates weighted feature distances and density-based noise cleaning. Validating the framework on datasets from Shenzhen (2023–2024), we demonstrate that the integrated Gradient Boosting Decision Tree (GBDT) model integrated with this strategy achieves optimal performance using only five features, yielding an F1-score of 0.808 and an Area Under the Precision-Recall Curve (AUC-PR) of 0.895. Notably, a Recall of 0.882 is attained, representing a 4.6% improvement over the baseline. This study contributes a cost-effective, high-sensitivity approach to disaster risk reduction, advancing predictive urban waterlogging management.
Deng et al. (Fri,) studied this question.