Accurate prediction of a compound's site(s) of metabolism (SoMs) mediated by cytochromes P450 (CYP450) is advantageous in the early stage of drug discovery. However, existing computational methods often struggle to explicitly capture the microscopic electronic evolution associated with bond cleavage, and conventional graph neural networks face inherent challenges of information attenuation and oversmoothing during message passing, which restrict their ability to model long-range spatial dependencies, thereby limiting their prediction accuracy and generalization ability. To address these limitations, we propose a novel deep learning framework, CypGEM, based on a geometry-aware and edge-enhanced graph transformer for SoM prediction. By introducing the gated edge fusion and dynamic edge update mechanisms, CypGEM captures the microscopic electronic evolution features associated with chemical bond variations during metabolic reactions. Meanwhile, by integrating a global geometry-aware layer containing graph-wide topology and three-dimensional (3D) spatial information, CypGEM reconstructs long-range intramolecular steric constraints. Built on a constructed high-quality benchmark data set, CypGEM achieves better performance compared with existing models. Notably, in the case studies concerning FDA-approved drugs from recent years, the model demonstrates robust predictive performance when confronted with novel scaffolds unseen in the training set, exhibiting strong generalization ability. Furthermore, interpretability analysis confirms that the model has captured the synergistic rules of electronic effects and steric hindrance, providing medicinal chemists with structural optimization guidance grounded in physicochemical intuition. CypGEM is freely available at https://lmmd.ecust.edu.cn/CypGEM.
Zhang et al. (Mon,) studied this question.