In the semi-arid Batna Belezma region of northeastern Algeria, groundwater is a vital resource for agriculture and drinking water. However, the climate leads to intense evaporation, which affects its quality. This study aims to identify the key hydrogeochemical processes that control groundwater composition in the Merouana Basin and to evaluate the predictive performance of machine learning (ML) models. A total of 30 groundwater samples were analyzed using multivariate statistical techniques, including Principal Component Analysis (PCA), and were modeled using PHREEQC to assess mineral saturation states. Additionally, ML-based regression models, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB),were employed to predict groundwater chemistry. The results indicate that the dominant ion distribution follows the following trend: Ca2+ > Mg2+ > Na+ and HCO3− > SO42− > Cl−. Alkaline earth metals (Ca2+ and Mg2+) constitute the major fraction of total dissolved cations, reflecting carbonate equilibrium and dolomite dissolution processes. In contrast, Na+ represents a smaller proportion of the cationic load; however, its hydro-agronomic significance is substantial due to its influence on sodium adsorption ratio (SAR) and soil permeability. The PHREEQC modeling showed that calcite and dolomite precipitation promote evaporite dissolution, while most samples remain undersaturated with respect to gypsum. The PCA results reveal high positive loadings of Mg2+, Cl−, SO42−, HCO3−, and EC, suggesting that ion exchange and seawater mixing are the primary controlling processes, with carbonate weathering playing a secondary role. To enhance predictive assessment, several supervised machine learning models were tested. Among them, the Random Forest model achieved the highest predictive performance (R2 = 0.96) with low RMSE and MAE values, confirming its robustness and reliability. The results indicate that silicate weathering and mineral dissolution are the primary mechanisms governing groundwater chemistry. The integration of multivariate statistics and machine learning provides a comprehensive understanding of groundwater evolution and offers a reliable predictive framework for sustainable water resource management in semi-arid environments. Geochemical model performance showed a high global accuracy (GPI = 0.91), confirming a strong agreement between observed and simulated chemical data. However, the HH value (0.81) indicates some discrepancies, particularly for specific ions or extreme conditions.
Mansouri et al. (Sun,) studied this question.