Deep learning has accelerated drug discovery by enabling large-scale virtual screening, but current models often act as "black boxes" and provide no formal guarantees about prediction reliability. This limitation is particularly critical for compound-protein interaction (CPI) prediction, where data sets are highly imbalanced and erroneous predictions can lead to costly failures. Here we introduce ConfBiXtCPI, an integrated framework that unifies accurate prediction, interpretability, and statistically rigorous uncertainty quantification. At its core is a bidirectional cross-attention transformer that captures molecular recognition patterns from sequence-level inputs, achieving state-of-the-art accuracy across multiple benchmarks. To address class imbalance and uncertainty, we incorporate Mondrian conformal prediction, which guarantees valid coverage for both majority and minority classes. Building on this, a conformal selection procedure enables principled control of the false discovery rate, allowing users to specify risk thresholds while maintaining discovery power. Beyond accuracy, ConfBiXtCPI provides mechanistic interpretability through attention maps that localize to biophysically relevant binding sites, and its uncertainty estimates support efficient active learning strategies. Together, these advances establish ConfBiXtCPI as a trustworthy and practical tool for guiding experimental validation and accelerating therapeutic discovery.
Yuan et al. (Fri,) studied this question.