Accurate estimation of soil organic carbon (SOC) is essential for monitoring carbon sequestration and mitigating climate change across agricultural and natural ecosystems. Mid-infrared (MIR) diffusereflectance spectroscopy is a cost-effective, high-throughput alternative to wet chemistry and dry combustion methods, yet reported model performance varies widely across studies and regions. This study assesses the effectiveness of MIR spectroscopy for SOC estimation using a global meta-analysis of 289 studies. Meta-analytic results indicate that fine grinding (≤53 µm) is associated with higher median R² (≈0.77 vs 0.72) and lower RMSE than ≤ 2 mm sieving alone, and that air-dried samples tend to show lower RMSE than oven-dried samples. Broader spectral ranges and higher resolution are generally associated with lower RMSE in SOC estimation, although these patterns are not consistent across all instruments and soil types. Among preprocessing options, multiplicative scatter correction and Savitzky–Golay derivatives are frequently associated with improved model fit, but their benefits are dataset-dependent. Across chemometric methods, partial least squares regression (PLSR) remains widely used (n = 278), whereas Cubist shows high median R² in a very small number of studies (n = 6) without statistically robust improvement over PLSR in this synthesis. These findings offer methodological insights for improving SOC monitoring and supporting global carbon accounting initiatives. We further provide reporting considerations and point to a companion paper (Part II) that prospectively evaluates these insights using a controlled modelling pipeline with global and national spectral libraries. • First global meta-analysis of MIR spectroscopy for SOC estimation • Sample preparation, spectral resolution, and scan count are associated with difference in model performance • Advanced preprocessing methods (MSC, SGD are frequently associated with higher SOC prediction accuracy • Future work should explore hybrid preprocessing and chemometric models integrated with AI
Li et al. (Mon,) studied this question.