Background: The blood–brain barrier (BBB), which restricts the brain penetration of most small molecules and almost all biologics, continues to be a significant hurdle in the development of drugs for the central nervous system (CNS). During early-stage screening, a reliable computational prediction of BBB permeability, typically expressed as log BB, can help reduce the experimental load. Methods: We provide a well-validated machine learning system created solely using the B3DB experimental database, which includes 7807 chemicals with BBB+/BBB− annotations and 1058 compounds with in vivo log BB values. Using the Mordred library, a carefully selected set of 40 two-dimensional chemical descriptors was calculated from SMILES notation without the use of artificial data augmentation. Stratified five-fold cross-validation was used to comprehensively benchmark the nine methods used in this study. Results: On a held-out test set (n = 212), gradient boosting produced the greatest regression performance, with R2 = 0.6043, RMSE = 0.4740 log units, and MAE = 0.3326, which is in line with the upper range recorded for experimental BBB datasets. On an internal test set (n = 1562), the corresponding classifier obtained an AUC-ROC of 0.9476 and a balanced accuracy of 0.8568; on an independent external validation set (n = 175), it achieved an AUC-ROC of 0.9137. Topological polar surface area was found by SHAP analysis to be the primary factor influencing BBB permeability, with lipophilicity and ionization-related characteristics being the second and third most important factors, respectively. Nonlinear relationships in accordance with accepted pharmacokinetic principles were validated using partial dependence analysis. Conclusion: This study provides a reliable technique for predicting BBB permeability in CNS drug discovery.
Tiwari et al. (Thu,) studied this question.