Artificial intelligence (AI)-driven molecular property prediction holds significant potential to accelerate drug discovery, yet the development of robust models is hindered by scarce, high-quality data and the diversity of prediction tasks. Although self-supervised learning (SSL), especially contrastive learning, has gained traction for molecular representation learning (MRL), the intrinsic structural integrity of molecules presents a unique challenge: it obstructs the straightforward creation of meaningful contrastive pairs. This often leads to suboptimal pretraining representations and, consequently, diminished downstream task performance. To overcome this limitation, we introduce a novel contrastive pair construction strategy based on molecular fragment contributions. Our method enables the learning of a higher-quality embedding space by utilizing information bottleneck theory to evaluate the importance of individual fragments for molecular properties─without relying on external prior knowledge. We implement a contrastive learning framework enhanced with an improved quadruplet loss that more effectively captures fine-grained molecular similarities. Empirical evaluations demonstrate that our approach achieves outstanding performance on the MoleculeNet benchmark and delivers promising results in predicting diverse pharmacokinetic (PK) and critical toxicity properties, highlighting its potential for real-world drug discovery applications.
Ru et al. (Wed,) studied this question.