ABSTRACT A four-panel graphical abstract illustrating an explainable AI-based framework for real-time flood prediction in India. The first panel presents multi-source geospatial data collection from NASA POWER API and SMAP soil moisture datasets, including rainfall, temperature, humidity, and soil moisture records from 33 monitoring stations across 10 flood-prone states of India between 2017 and 2024. The second panel shows physics-informed feature engineering using 52 hydrologically meaningful variables such as SCS-CN Curve Number, Topographic Wetness Index (TWI), and SMAP soil moisture, with SMOTE balancing applied for rare flood-event handling. The third panel demonstrates an XGBoost machine learning model combined with SHAP explainability analysis, highlighting monsoon seasonality, monsoon rainfall, antecedent soil moisture, and soil-monsoon interaction as key contributors to flood prediction, achieving ROC-AUC=0.9623, recall=74.2%, and MCC=0.5095. The final panel illustrates deployment of a near-real-time NDMA-aligned 4-tier alert dashboard providing Extreme, High, Moderate, and Low flood-risk alerts for 33 stations with total prediction latency under two minutes per daily cycle. India experiences devastating floods annually during the June–September Southwest Monsoon, affecting over 40 million people and causing economic losses exceeding ₹25,000 crore. Existing hydraulic models require dense instrumentation unavailable across most flood-prone sub-districts. This study presents a physics-informed machine learning framework employing eXtreme Gradient Boosting (XGBoost) with 52 hydrologically grounded features encoding Soil Conservation Service Curve Number (SCS-CN) antecedent moisture conditions, Topographic Wetness Index (TWI), and Synthetic Minority Over-sampling Technique (SMOTE)-balanced training across 96,426 daily observations from 33 stations in 10 states (2017–2024). Optuna-tuned hyperparameters yield a test receiver operating characteristic area under the curve (ROC-AUC) of 0.9623, F1-score of 0.4878, Matthews Correlation Coefficient (MCC) of 0.5095, precision of 36.3%, false alarm ratio (FAR) of 0.637 at the operational threshold, balanced accuracy of 0.8612, PR-AUC of 0.4264, and Brier score of 0.0118, and 74.2% flood event recall on completely unseen 2023–2024 data. SHapley Additive exPlanations (SHAP) analysis confirms peak-monsoon seasonality and antecedent soil moisture as dominant risk drivers, physically consistent with the SCS-CN runoff equation. The framework is deployed as a near-real-time web interface with daily NASA POWER API ingestion providing four-tier National Disaster Management Authority (NDMA)-aligned risk alerts for District Disaster Management Authorities across India.
Sougata Karmakar (Mon,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: