Drug solubility in supercritical carbon dioxide (SC-CO2) plays a pivotal role in the development of particle engineering, drug loading, and solvent-free pharmaceutical formulations. However, experimental solubility determination in supercritical systems remains costly, time-consuming, and compound-specific. In this study, an interpretable data-driven framework is proposed to support pharmaceutical formulation scientists by accurately predicting drug solubility in SC-CO2 while elucidating the governing physicochemical factors. Multiple machine learning regressors, including Extreme Gradient Boosting and Support Vector Regression, were developed and further integrated into an ensemble strategy to enhance robustness and generalizability. Model performance was systematically optimized using bio-inspired metaheuristic algorithms, enabling efficient hyperparameter selection across complex, nonlinear search spaces. Beyond predictive accuracy, model interpretability was emphasized through sensitivity-based and amplitude-based feature analyses, revealing the dominant molecular descriptors and process conditions influencing solubility behavior. The results demonstrate that the proposed framework not only improves solubility prediction accuracy but also provides mechanistic insights relevant to drug selection, formulation feasibility, and supercritical processing design. This work establishes a practical computational tool for accelerating pharmaceutical development pipelines involving supercritical fluid technologies.
Khafagy et al. (Sat,) studied this question.