What question did this study set out to answer?

The central aim is to create a reliable cheminformatics workflow for prioritizing PARP1 inhibitors based on potency and structural features.

April 10, 2026Open Access

Interpretable QSAR and Complementary Docking for PARP1 Inhibitor Prioritization: Reliability Stratification and Near-Domain Screening

Key Points

The central aim is to create a reliable cheminformatics workflow for prioritizing PARP1 inhibitors based on potency and structural features.
Utilized a curated dataset of 3339 PARP1 inhibitors with RDKit descriptors.
Applied Random Forest for feature selection, reducing 1143 features down to 132.
Developed and optimized five regression models, including a stacked ensemble.
Employed permutation feature importance and SHAP for model interpretation.
Conducted docking studies with AutoDock Vina and SwissDock on selected compounds.
The stacked ensemble model achieved test R2 = 0.723 and RMSE = 0.610 pIC50 units.
Fetched about 32,450 analogs from PubChem, with 3349 predicted to have IC50 ≤ 10 nM.
A positive association was found between predicted and experimental pIC50 values (R2 = 0.124).
Three specific ligands showed the strongest support in docking studies relative to niraparib.

Abstract

Background/Objectives: Poly(ADP-ribose) polymerase 1 (PARP1) is an important therapeutic target in DNA repair-deficient cancers, but discovery of new inhibitors remains constrained by scaffold convergence, tolerability limits, and acquired resistance. This study aimed to develop an interpretable, reliability-stratified cheminformatics workflow for PARP1 potency prioritization and structure-based follow-up. Methods: A curated ChEMBL dataset of 3339 PARP1 inhibitors was encoded using RDKit 2D descriptors and Avalon fingerprints (1143 initial features), then reduced to 132 informative variables by Random Forest-based feature selection. Five regression models were optimized, including a stacked ensemble. Model interpretation was performed using permutation feature importance and SHAP. External near-domain corroboration was assessed using a stringent PubChem similarity expansion (Tanimoto > 0.90) around sub-10 nM seed compounds, followed by comparison with retrievable experimental PARP1 activity values. Top scaffold-diverse candidates were further evaluated by complementary docking against PARP1 (PDB: 4R6E) using AutoDock Vina and cavity-guided docking through the SwissDock platform. Results: The stacked ensemble achieved the best held-out performance (test R2 = 0.723; RMSE = 0.610 pIC50 units), with 83.7% of test predictions within ≤0.75 pIC50 units and only 2.7% exceeding 1.5 pIC50 units. PubChem similarity expansion retrieved approximately 32,450 analogs, of which 3349 were predicted to have IC50 ≤ 10 nM. Among 366 compounds with retrievable experimental PARP1 activity values, predicted versus experimental pIC50 showed a positive association (R2 = 0.124; Pearson r = 0.479), with RMSE = 0.491 and MAE = 0.330 pIC50 units. Three ligands—CID 168873053, CID 175154210, and CID 172894737—showed the strongest complementary docking support and pocket-consistent poses relative to niraparib. Conclusions: This workflow provides a transparent and practically useful framework for near-domain PARP1 inhibitor prioritization. The combined QSAR, explainability, external corroboration, and docking strategy supports shortlist generation for experimental follow-up.

Bookmark

View Full Paper