This study presents a novel Machine Learning framework for detecting pre-earthquake ionospheric anomalies using the Adaptive Boosting (AdaBoost) ensemble algorithm, applied to high-resolution Total Electron Content (TEC) data derived from Türkiye’s dense TNPGN-Active GNSS network. Within the Lithosphere-Atmosphere-Ionosphere Coupling (LAIC) paradigm, we address key challenges in earthquake precursor research by implementing a three-class classification scheme to distinguish genuine seismo-ionospheric disturbances (Class-2: three days preceding earthquakes) from baseline variability under geomagnetically quiet (Class-0) and disturbed (Class-1) conditions, while incorporating geomagnetic indices (Kp, Ap, Dst) to filter space weather effects. IONOLAB-TEC estimates are analyzed for three M ≥ 6. 0 earthquakes in Türkiye: Elazığ (Mw 6. 7, 24 January 2020), İzmir (Mw 7. 0, 30 October 2020) and the Kahramanmaraş doublet (Mw 7. 8 and 7. 5, 6 February 2023). For each event, TEC vectors from 12 strategically selected stations are Z-score normalized and classified using AdaBoost with decision trees as weak learners (T = 100 iterations, \: learningₑ₀ₓ₄ =1. 0, MaxNumSplits=20), evaluated under 10-fold cross-validation with training set sizes varied from 10% to 100% in 5% increments. The model achieves Cohen’s kappa values ranging from 0. 744 to 0. 992 (substantial to almost perfect agreement), with classification Accuracy varying from 82. 94% to 99. 48% depending on earthquake characteristics. Performance varies regionally: Elazığ demonstrates exceptional results (99. 48% Accuracy, κ = 0. 992), Kahramanmaraş shows strong performance (90. 21% Accuracy, κ = 0. 853), while İzmir exhibits more modest results (82. 94% Accuracy, κ = 0. 744). Pre-earthquake class (Class-2) metrics vary across tectonic settings, with detailed per-class results provided in Supplementary Table S1. Feature importance analysis reveals distributed spatial sampling with algorithm-dependent inverse correlation between station importance and epicentral distance. AdaBoost shows weak-to-moderate correlation (Spearman ρ = −0. 55 to 0. 10, generally non-significant), while Random Forest captures this spatial relationship more consistently (ρ = −0. 37 to −0. 69, with E3-E4 reaching significance at p = 0. 04). Statistical validation (McNemar’s test, p < 0. 001; Cohen’s kappa up to 0. 992 for AdaBoost and 1. 0 for Random Forest) outperforms binary baselines, highlighting the robustness of both ensemble methods in handling noisy, imbalanced geophysical data. Rigorous 10-fold cross-validation confirms that Random Forest consistently outperforms AdaBoost across all tectonic settings (E1: 100. 00% vs. 99. 49%; E2: 99. 78% vs. 83. 47%; E3-E4: 99. 69% vs. 88. 33%, p < 0. 001), establishing Random Forest as the preferred algorithm for operational seismo-ionospheric monitoring. Both ensemble methods substantially exceed traditional threshold-based approaches, validating Machine Learning’s potential for earthquake precursor detection. This study demonstrates the critical importance of cross-validation with fold-wise normalization for small geophysical datasets, as proper validation protocols are essential to ensure reported performance reflects genuine generalization rather than methodological artifacts.
Karatay et al. (Fri,) studied this question.