What question did this study set out to answer?

This study aims to develop open-source machine learning data-driven models for CO2 enhanced oil recovery, focusing on WAG and SAG methods.

March 10, 2026Open Access

CO2 WAG and SAG Machine Learning Data-Driven Models in an Open-Source Framework – A Theoretical Study

Read Full Paperexternally

Key Points

This study aims to develop open-source machine learning data-driven models for CO2 enhanced oil recovery, focusing on WAG and SAG methods.
Coupled an open-source simulator with Python tools for modeling.
Generated datasets with random and Latin hypercube sampling over specified injection rates.
Evaluated machine learning models including KNN, DT, RF, and GBR for predictions.
Increasing sample size from 100 to 300 improved model accuracy.
All models achieved R2 values exceeding 0.970 for CO2 retention and oil production.
Gradient boosting regressor (GBR) achieved the best accuracy with R2 values above 0.985.

Abstract

Optimizing CO 2 enhanced oil recovery (CO 2 EOR) requires numerous reservoir simulations, creating a computational burden that can be addressed using data driven models (DDMs). However, machine learning (ML) DDMs use proprietary simulators, limiting their reproducibility. This study presents open-source ML DDMs for CO 2 water alternating gas (WAG) and surfactant alternating gas (SAG) by coupling an open-source simulator with Python tools. Datasets (100 and 300 samples) were generated using the SPE5 benchmark with random and Latin hypercube sampling (LHS) over CO 2 injection rates of 1000-5000 standard cubic feet per day (1-5 Mscf/day) and water injection rates 5000-15000 stock tank barrels per day (5-15 Mstb/day). K-nearest neighbors (KNN), decision trees (DT), random forests (RF), and gradient boosting regressors (GBR) were evaluated to predict cumulative oil and gas and CO 2 retention. The study showed that increasing from 100 to 300 samples improved accuracy. For CO 2 WAG, all ML models achieved coefficient of determination (R 2 ) values exceeding 0.970, and root mean squared errors (RMSE) below 100 Mscf for CO 2 retention and 0.2 MMstb for oil production. LHS outperformed random with 100 samples, but this advantage diminished with 300. RF and GBR were more robust to sparse sampling than KNN, and GBR achieved the best overall accuracy (validation R 2 ≥ 0.985). Grid search, random search, and particle swarm optimization provided marginal gains. The grid and random search exhibited the best accuracy–efficiency trade-off. Optimized GBR DDMs R 2 exceeded 0.995 for CO 2 retention, 0.999 for oil, and 0.990 for gas production across WAG and SAG scenarios. Unlike studies that rely on commercial simulators, the proposed workflow is fully reproducible, openly licensed, and transferable between the WAG and SAG, which collectively constitute the novelty of this study. All data and code are available to facilitate reuse and extension to other CO 2 EOR and storage applications. • Open-source workflow integrates reservoir simulation and machine learning • Gradient boosting predicts oil output and stored carbon with R 2 above 0.96 • Particle swarm optimization improves accuracy but increases runtime • Latin hypercube sampling halves the error versus random sampling at small data sizes • Framework accelerates the design of carbon dioxide–alternating-water and foam floods

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jorge Rodrigo Lliguizaca-Davila

Freddy Paul Carrión-Maldonado

David Landa-Marbán

NORCE Research AS

Journals

Petroleum

Actions

Institutions

University of Bergen

NORCE Research AS

Escuela Superior Politecnica del Litoral

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

CO2 WAG and SAG Machine Learning Data-Driven Models in an Open-Source Framework – A Theoretical Study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider