December 2, 2025Open Access

Molecular Motif Learning as a pretraining objective for molecular property prediction

Key Points

Molecular Motif Learning excels in predicting molecular properties with enhanced accuracy for drug discovery.
Evaluation across 16 benchmarks reveals superior performance compared to existing contrastive methods.
Analysis focuses on grouping small molecules by scaffold and proteins with similar structures and functions.
These findings highlight the potential of MotiL in improving outcomes in biopharmaceuticals.

Abstract

Molecular property prediction is crucial for drug discovery in biopharmaceuticals since it helps identify promising compounds, optimizing the efficacy of developing new therapies. Despite its importance, existing deep learning-based methods for this task are often incongruous with fundamental chemical properties. Here we show that an unsupervised pretraining approach, Molecular Motif Learning (MotiL), learns molecular representations that preserve both whole-molecule structure and motif-level information directly from native molecular graphs. MotiL produces representations that group small molecules sharing a common core structure (i.e., scaffold) and proteins with related three-dimensional structures and functions. We evaluated MotiL on at least 16 molecule benchmarks, and uncovered that it captures analogous graph representations not only for small molecules with the same scaffold but also for protein macromolecules with similar structures and overlapping chemical functions such as tRNA binding. These informative representations empower MotiL to surpass the accuracy of state-of-the-art contrastive or predictive methods in the prediction of molecular properties like blood brain barrier permeability.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper