The ecologically fragile Himalayan region faces escalating vulnerability to extreme precipitation events driven by orographic-atmospheric interactions. However, forecasting these events remains a formidable challenge, as traditional global models often fail to capture mesoscale convective extremes due to coarse spatial resolutions. North India serves as a crucial agricultural “breadbasket,” yet its hydrological integrity is increasingly compromised by elevation-dependent warming. Accurate detection of these shifting precipitation regimes is essential for developing a foundational diagnostic framework for decision-support systems and mitigating disasters like cloudbursts and flash floods. This study evaluates the high-resolution (12 km) Indian Monsoon Data Assimilation and Analysis reanalysis dataset (1979–2022) combined with machine learning classifiers. These models were designed to categorize daily accumulated rainfall into percentile-based groups - specifically moderate, heavy, and extreme - for targeted detection of high-impact events. Beyond a statistically significant increasing trend of + 3.0 mm/decade in annual precipitation across the Himalayan foothills, the study found that ensemble based learning (RF) demonstrated clear superiority over geometric classifiers (SVM) in achieving significant overall accuracy of 81.6% in predicting tail-end distributions. RF achieved a precision of 0.80 for 'Extreme' events with a high degree of specificity, suggesting its potential for reducing false-alarm rates in complex orographic zones. These findings establish the superiority of ensemble-based learning over geometric classifiers for meteorological applications in complex terrain. The Random Forest based framework offers a reliable, cost-effective tool for operational forecasting, bridging the gap between coarse global models and local observational scarcity to support disaster mitigation strategies in North India. The graphical abstract illustrates the study’s integrated machine learning framework designed to resolve mesoscale extreme precipitation events over the complex orography of North India. The workflow is divided into three distinct phases namely Data & Study Area (Left Panel), Methodological Framework (Center Panel) and, Scientific Insights & Results (Right Panel). The framework begins with utilizing high-resolution (12 km) Indian Monsoon Data Assimilation and Analysis (IMDAA) reanalysis dataset (1979–2022). This section highlights the study domain, encompassing the physiographically diverse North Indian states, and incorporates elevation to account for the critical orographic-atmospheric interactions unique to the Western Himalayas. The central component details the dual-pronged analytical approach. First, an observational analysis characterizes spatiotemporal precipitation regimes, identifying a regional intensification trend of + 3.0 mm/decade. Second, the study employs supervised machine learning models, benchmarking the geometric approach of support vector machines (SVM) against the ensemble-based random forest (RF) algorithm to classify precipitation events. The final section presents the comparative performance evaluation. The maps demonstrate that Random Forest achieves superior spatial robustness, expanding high-reliability coverage to 74.4% of the domain compared to SVM’s limited coverage. Histograms confirm RF’s higher mean accuracy (81.6%) versus SVM (80.9%). The Receiver Operating Characteristic curves highlight the critical finding that while SVM fails to detect minority “Extreme” class events, Random Forest maintains high precision (0.80) and perfect specificity, making it the superior tool for a foundational diagnostic framework for decision-support systems. Quantified an increasing annual precipitation trend of + 3.0 mm/decade in the Himalayan foothills using 12-km reanalysis data. Demonstrated superior skill of Random Forest (81.6% accuracy) over Support Vector Machines in capturing mesoscale convective extremes. Identified significant spatial variability in extreme events across ~ 6,500 grid points using a diagnostic-predictive system. Established a classification system utilizing atmospheric predictors to categorize daily precipitationinto percentile-based intensity groups.
Tandon et al. (Fri,) studied this question.