Predicting molecular properties is vital for drug discovery, but experimental measurement is costly and limited by scarce labeled data. Self-supervised molecular pretraining can leverage large unlabeled datasets, reducing dependence on extensive annotations. However, most methods struggle to preserve domain-specific chemical knowledge, especially clinically relevant substructures such as motifs. Random masking and generic graph augmentations often degrade critical chemical information and harm interpretability. Many approaches also work at a single scale-either atom or motif-missing opportunities for cross-scale integration. We propose A2M-Mol, a multi-perspective molecular pretraining framework that combines atom-level and motif-level views through four parallel graph constructions. This design enables cross-view alignment and multiscale fusion, explicitly encoding chemical knowledge. A2M-Mol employs a suite of self-supervised tasks, including cross-view correspondence, atomic reconstruction, global topology modeling, and property constraint enforcement, all coordinated via tailored contrastive learning. Extensive experiments across benchmarks and backbone architectures show consistent improvements over state-of-the-art methods. Ablation studies confirm strong synergies among the tasks. A2M-Mol maintains robust predictive accuracy across data scales, demonstrating effectiveness for real-world molecular property prediction and potential to accelerate drug discovery.
Wang et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: