Ligand-based virtual screening with ROCS (Rapid Overlay of Chemical Structures) enables rapid exploration of billion-member chemical libraries but requires a known active compound to serve as the reference query, limiting application to targets with established chemical matter and biasing results toward close analogs. Structure-based docking approaches are an alternative technique less constrained by prior chemical matter, but substantial computational costs of these methods limit large-scale deployment. Here we present Struct2Query, a workflow that bridges these approaches by converting protein pockets into composite-molecule ROCS queries. Our method leverages OpenEye SiteHopper to efficiently search a curated database of over 78,000 crystallographic protein-ligand complexes from the RCSB, identifying structurally analogous pockets, then transplants ligands from these related pockets to generate an ensemble of binding hypotheses. Rather than consolidating this ligand ensemble into a consensus pharmacophore, we retain all constituent shape and color features in a composite-molecule ROCS query, allowing densely populated regions to emerge as natural hotspots. Benchmarking early enrichment ability on DEKOIS 2.0 (81 targets) and DUDE-Z (43 targets) data sets demonstrates performance matching or exceeding popular structure-based methods such as Glide and HYBRID docking while maintaining the throughput of ligand-centric approaches compatible with GPU-accelerated FastROCS. Scaffold diversity analysis of virtual screening hit lists reveals improved chemotype coverage compared to single-ligand ROCS for three of four purchasable compound libraries tested, with specific considerations for combinatorial libraries discussed. Struct2Query thus enables a structure-informed virtual screening method amenable to the scale and throughput of ligand-based methods.
Shmilovich et al. (Sun,) studied this question.