Machine learning techniques are increasingly applied in geological research and widely adopted in industry. However, one commonly available dataset remains underutilized: petrographic data from classical point-counting analyses. These data, routinely collected for reservoir lithologies worldwide, are often paired with core measurements such as porosity and permeability and capture detrital and authigenic components, textural properties, and diagenetic effects that largely govern reservoir quality. Building on an initial proof of concept, we expand the scope to a legacy dataset comprising 875 samples from 51 wells, compiled over 25 years by at least 21 petrographers. This dataset demonstrates the feasibility of predicting porosity and permeability from point-counting data across diverse lithologies and sources. Despite potential operator bias and classification inconsistencies, predictive performance remains robust. We present the outcome from two Histogram-based Gradient Boosting Regression Tree models trained on four major reservoir lithologies in Germany and the Netherlands: Upper Carboniferous, Permian Rotliegendes, Triassic Buntsandstein, and Jurassic sandstones. The porosity model achieves R2 = 0.87, MAE = 1.77%, and RMSE = 2.23%. The permeability model (log-transformed) yields R2 = 0.82, MAE = 0.47, and RMSE = 0.64, consistent with the log-normal distribution of permeability. SHAP analyses highlight key petrographic features influencing predictions, offering insights into detrital and diagenetic reservoir quality controls. Model performance remains robust under well-wise splits, confirming applicability to unseen wells. Training on cored intervals may enable extension to cuttings, which are more continuously available along well sections. Leveraging such legacy datasets can enhance reservoir quality assessment in sample-limited projects and improve the understanding of global reservoir systems.
Busch et al. (Thu,) studied this question.