Optimizing CO 2 enhanced oil recovery (CO 2 EOR) requires numerous reservoir simulations, creating a computational burden that can be addressed using data driven models (DDMs). However, machine learning (ML) DDMs use proprietary simulators, limiting their reproducibility. This study presents open-source ML DDMs for CO 2 water alternating gas (WAG) and surfactant alternating gas (SAG) by coupling an open-source simulator with Python tools. Datasets (100 and 300 samples) were generated using the SPE5 benchmark with random and Latin hypercube sampling (LHS) over CO 2 injection rates of 1000-5000 standard cubic feet per day (1-5 Mscf/day) and water injection rates 5000-15000 stock tank barrels per day (5-15 Mstb/day). K-nearest neighbors (KNN), decision trees (DT), random forests (RF), and gradient boosting regressors (GBR) were evaluated to predict cumulative oil and gas and CO 2 retention. The study showed that increasing from 100 to 300 samples improved accuracy. For CO 2 WAG, all ML models achieved coefficient of determination (R 2 ) values exceeding 0.970, and root mean squared errors (RMSE) below 100 Mscf for CO 2 retention and 0.2 MMstb for oil production. LHS outperformed random with 100 samples, but this advantage diminished with 300. RF and GBR were more robust to sparse sampling than KNN, and GBR achieved the best overall accuracy (validation R 2 ≥ 0.985). Grid search, random search, and particle swarm optimization provided marginal gains. The grid and random search exhibited the best accuracy–efficiency trade-off. Optimized GBR DDMs R 2 exceeded 0.995 for CO 2 retention, 0.999 for oil, and 0.990 for gas production across WAG and SAG scenarios. Unlike studies that rely on commercial simulators, the proposed workflow is fully reproducible, openly licensed, and transferable between the WAG and SAG, which collectively constitute the novelty of this study. All data and code are available to facilitate reuse and extension to other CO 2 EOR and storage applications. • Open-source workflow integrates reservoir simulation and machine learning • Gradient boosting predicts oil output and stored carbon with R 2 above 0.96 • Particle swarm optimization improves accuracy but increases runtime • Latin hypercube sampling halves the error versus random sampling at small data sizes • Framework accelerates the design of carbon dioxide–alternating-water and foam floods
Building similarity graph...
Analyzing shared references across papers
Loading...
Jorge Rodrigo Lliguizaca-Davila
Freddy Paul Carrión-Maldonado
David Landa-Marbán
NORCE Research AS
Petroleum
University of Bergen
NORCE Research AS
Escuela Superior Politecnica del Litoral
Building similarity graph...
Analyzing shared references across papers
Loading...
Lliguizaca-Davila et al. (Sun,) studied this question.
synapsesocial.com/papers/69af944f70916d39fea4b63d — DOI: https://doi.org/10.1016/j.petlm.2026.02.003
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: