The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.5c02291 . Overall workflow for data processing, modeling, and experimental validation (Figure S1); ROC curves for Model 1 across GPCR targets and the hERG inhibitor model generated from the Tox21 10K training data sets (Figure S2); model-predicted probabilities and confusion matrices for GPCR targets and hERG inhibition from external validation using the LOPAC data sets (Figure S3); distribution of Tanimoto similarities between LOPAC compounds and training-set molecules (Figure S4); distribution of hERG activities (-LogAC50) for LOPAC compounds across different GPCR activity categories (Figure S5); RF and SVM model AUCs evaluated using random splits versus Bemis–Murcko scaffold splits (Table S2); positive predictive value (PPV) for GPCR targets after removal of duplicate LOPAC compounds and compounds with Tanimoto coefficient >0.7, compared with LOPAC hit rates (Table S3); and performance comparison of ML models in this study versus CToxPred2 and CardioGenAI using the LOPAC data set for hERG liability prediction (Table S6) ( PDF ) Performance metrics for all 10 training iterations using 5-fold stratified cross-validation on the Tox21 10K data set (Table S1); virtual screening results of the NCATS 360 K compound library for GPCR agonist/antagonist activity and hERG inhibition using Model 1 (Table S4); experimental validation of CHRM1-active compounds without significant hERG inhibition (Table S5A for CHRM1 agonists and Table S5B for CHRM1 antagonists) (Table S5); and structural features (ToxPrint chemotypes) significantly enriched in GPCR modulators and hERG inhibitors (Table S7) ( XLSX )
Luo et al. (Tue,) studied this question.