Deep learning has become a key tool for carbonate thin-section image analysis. However, the lack of large public datasets limits reproducibility and fair model comparison. To address this, we present DeepCarbonate, a cleaned and standardized benchmark dataset. Samples were collected from the Ediacaran Dengying, Cambrian Longwangmiao, and Triassic Leikoupo and Jialingjiang Formations in the Sichuan Basin, China, and the Cretaceous Mishrif Formation in the UAE. The dataset was curated by petroleum geology experts; invalid images (blurred, low brightness, or corrupted) were removed through expert voting and 2σ filtering, and all images were reorganized in the ImageNet format. DeepCarbonate contains 22 lithological categories, hierarchically organized by optical mode (PPL, XPL, R) and split into train, validation, and test subsets, ensuring standardized benchmarking and reproducible experiments. Using PyTorch with CUDA acceleration, we evaluated ResNet, VGG, DenseNet, MobileNet, and EfficientNet models under baseline, ablation, long tailed distribution, and balanced Top 9 subset experiments. Results highlight the dataset's value as a robust benchmark for carbonate petrography research and applications.
Li et al. (Wed,) studied this question.