This study proposes a pretraining enhanced multicomponent directed message passing neural network (PEMC D-MPNN) for predicting the solubility of H2S in ionic liquids (ILs). Traditional feature engineering methods often treat ILs as single entities, overlooking the different structural roles of the cations and anions. To address this, we introduce a multicomponent framework that separately encodes cation and anion structures using a D-MPNN, explicitly modeling their interactions. Given the limited experimental H2S solubility data, a pretraining strategy is employed utilizing the CheMeleon foundation model trained on one million molecules from PubChem to learn universal molecular representations, which are then fine-tuned for H2S solubility prediction. The proposed model integrates operational conditions (i.e., temperature and pressure) and leverages interpretability tools, such as SHapley Additive exPlanations (SHAP) and principal component analysis (PCA), to validate feature importance. The evaluation results demonstrate that the proposed PEMC D-MPNN model outperforms existing models (i.e., GPR, RF, XGBoost, SVM, DBN, RNN, DJINN, GP, GMDH), with an R2 of 0.9922, MAE of 0.0080, and RMSE of 0.0136 on 722 data points and an R2 of 0.9964, AAPRE of 5.0506%, and RMSE of 0.0099 on 1516 data points. External validation on unseen ILs, i.e.,1-ethyl-3-methylimidazolium trifluoromethanesulfonate, 1-ethyl-3-methylimidazolium tris(pentafluoroethyl)trifluorophosphate, and 1-butylpyridinium tetrafluoroborate (with 139 data points) confirms strong generalization ability, highlighting the robustness of the proposed model and practical utility for IL screening and design.
Zhang et al. (Thu,) studied this question.