Accurate identification of the coal body structure (CBS) is critical for efficient coalbed methane (CBM) development. However, the highly imbalanced distribution and complex nonlinearity of well logging data severely hinder precise CBS identification. This study proposes a CBS recognition method for imbalanced logging data by integrating k-means clustering, adaptive synthetic sampling (ADASYN), and random forest (RF) algorithm. First, a hybrid resampling approach combining k-means clustering and ADASYN is applied to rebalance the CBS logging data, achieving effective class equilibrium. Then, a CBS recognition model is constructed using the RF algorithm, with interpretable artificial intelligence (SHapley Additive Explanations, SHAP) employed to identify the key influencing factors. Results show that the proposed k-means-ADASYN-RF model effectively alleviates class imbalance in CBS data while preserving the original data distribution. It achieves excellent identification performance, with precision, recall, and F1 score all above 0.90 on the test set and a macro-average F1 score of 0.95, thus outperforming the RF, ADASYN-RF, and XGBoost models. SHAP analysis indicates that Depth, SP, and CAL logs are the main features affecting CBS identification. This study provides technical support for accurate CBS identification and facilitates efficient CBM exploitation.
Zhou et al. (Fri,) studied this question.