Accurate prediction of CO2 solubility in brine is critical for evaluating the capacity and safety of geological carbon storage. While machine learning offers promise, existing studies are constrained by limited data sets that seldom encompass multicomponent impure CO2 (containing CH4 and N2) in pure water and NaCl brine and often overlook computational efficiency in model optimization. To address these gaps, this study introduces a novel hybrid framework that integrates the LightGBM model with two advanced metaheuristic optimizers─the Ivy Algorithm (IVYA) and the Gaussian-mapping-enhanced Hiking Optimization Algorithm (GHOA). These optimizers are specifically employed to efficiently navigate the high-dimensional, nonconvex hyperparameter space of tree-based models, enhancing global search capability and mitigating premature convergence. Trained on a comprehensive impurity-inclusive brine database, the resulting IVYA-LightGBM model achieved the best performance on the test set (R2 = 0.9920, MAE = 0.0008 mol/mol, AARD = 7.23%, RMSE = 0.0016 mol/mol) and demonstrated the most outstanding runtime performance and minimal memory consumption. SHAP analysis identified pressure, solute system, and temperature as the dominant factors governing solubility. This work highlights that coupling large-scale, complex-system data with next-generation optimization algorithms is key to developing highly accurate and efficient predictive tools for CO2 sequestration.
Cao et al. (Fri,) studied this question.