What question did this study set out to answer?

This research aims to predict porosity and permeability using machine learning on a legacy dataset of petrographic data.

March 18, 2026Open Access

Unlocking the potential of legacy data for future geoenergy and storage applications: Porosity and permeability prediction based on machine learning applied to petrographic data

Key Points

This research aims to predict porosity and permeability using machine learning on a legacy dataset of petrographic data.
Analyzed 875 samples from 51 wells over 25 years by multiple petrographers
Utilized point-counting petrographic data linked to core measurements
Developed Histogram-based Gradient Boosting Regression Tree models for four major reservoir lithologies
Conducted SHAP analyses to identify influential petrographic features
Porosity model achieved R2 of 0.87, MAE of 1.77%, and RMSE of 2.23%
Permeability model yielded R2 of 0.82, MAE of 0.47, and RMSE of 0.64
Model performance was robust across different well splits and consistent with log-normal distribution of permeability

Abstract

Machine learning techniques are increasingly applied in geological research and widely adopted in industry. However, one commonly available dataset remains underutilized: petrographic data from classical point-counting analyses. These data, routinely collected for reservoir lithologies worldwide, are often paired with core measurements such as porosity and permeability and capture detrital and authigenic components, textural properties, and diagenetic effects that largely govern reservoir quality. Building on an initial proof of concept, we expand the scope to a legacy dataset comprising 875 samples from 51 wells, compiled over 25 years by at least 21 petrographers. This dataset demonstrates the feasibility of predicting porosity and permeability from point-counting data across diverse lithologies and sources. Despite potential operator bias and classification inconsistencies, predictive performance remains robust. We present the outcome from two Histogram-based Gradient Boosting Regression Tree models trained on four major reservoir lithologies in Germany and the Netherlands: Upper Carboniferous, Permian Rotliegendes, Triassic Buntsandstein, and Jurassic sandstones. The porosity model achieves R2 = 0.87, MAE = 1.77%, and RMSE = 2.23%. The permeability model (log-transformed) yields R2 = 0.82, MAE = 0.47, and RMSE = 0.64, consistent with the log-normal distribution of permeability. SHAP analyses highlight key petrographic features influencing predictions, offering insights into detrital and diagenetic reservoir quality controls. Model performance remains robust under well-wise splits, confirming applicability to unseen wells. Training on cored intervals may enable extension to cuttings, which are more continuously available along well sections. Leveraging such legacy datasets can enhance reservoir quality assessment in sample-limited projects and improve the understanding of global reservoir systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

Unlocking the potential of legacy data for future geoenergy and storage applications: Porosity and permeability prediction based on machine learning applied to petrographic data

Key Points

Abstract

Cite This Study