Abstract Accurate prediction of sulphur content in crude oil is essential for optimizing refining efficiency, ensuring environmental compliance, and improving fuel quality. This study introduces an explainable machine‐learning framework to predict sulphur weight percentage (wt.%) using a comprehensive dataset of 664 crude oil samples characterized by 73 physicochemical properties. Five regression algorithms including support vector regression, k‐nearest neighbours, decision tree, random forest, and extreme gradient boosting (XGBoost) were trained and validated under identical preprocessing and cross‐validation protocols. XGBoost achieved the highest test‐set accuracy ( R 2 = 0.89), significantly outperforming random forest ( R 2 = 0.74) and decision tree regression ( R 2 = 0.72) and other models. Cross‐validation confirmed the robustness of XGBoost (mean R 2 = 0.92), while Shapley additive explanations (SHAP) analysis identified Watson K, asphaltene content, and nitrogen by weight (%) as the most influential features. The novelty of this study lies in integrating a high‐dimensional dataset with explainable AI (SHAP) to uncover physicochemical drivers of sulphur content, thereby achieving both improved accuracy and interpretability over existing models. This data‐driven approach provides a scalable and precise sulphur estimation tool that enables refiners to optimize blending strategies, reduce desulphurization costs, and comply with stringent environmental regulations.
Pullanikkattil et al. (Fri,) studied this question.