What question did this study set out to answer?

The research aims to enhance molecular representation learning by integrating multiple views for better feature extraction.

January 17, 2026Open Access

MMPCS: multi-view molecular pretraining based on consistency information and specific information

Key Points

The research aims to enhance molecular representation learning by integrating multiple views for better feature extraction.
Developed a multi-view molecular pretraining method called MMPCS.
Utilized Graph Isomorphism Network and RoBERTa model for encoding molecular structures.
Factorized molecular representations into consistency and specific components.
Employed an autoencoder to align consistency information across different views.
Evaluated performance against 16 existing molecular pretraining methods.
Achieved the highest average performance across classification and regression tasks.
Demonstrated robust predictions for drug-target binding affinity and cancer drug response.
Facilitated drug repurposing efforts, as shown in a case study on the SARS-CoV-2 Omicron variant.

Abstract

Abstract Motivation The goal of molecular representation learning is to automate the extraction of molecular features, a critical task in cheminformatics and drug discovery. While pretraining models using multiple views like SMILES, two-dimensional graphs, and three-dimensional conformations have advanced the field, integrating them effectively to produce superior representations remains a challenge. Results bridge this gap, we propose a novel multi-view molecular pretraining method termed MMPCS, which explicitly factorizes representations into consistency and specific information. Our approach utilizes the Graph Isomorphism Network and the RoBERTa model to encode two-dimensional molecular topological graphs and SMILES sequences, respectively. Each resulting molecular embedding is decomposed into a shared consistency component and a view-specific remainder. An autoencoder then aligns the consistency information across views. The combined consistency and view-specific representations serve as input for downstream tasks, enabling precise and task-aware predictions. When benchmarked against 16 state-of-the-art molecular pretraining methods, MMPCS achieved the highest average performance across both classification and regression tasks for molecular property prediction. It also delivered outstanding results in predicting drug-target binding affinity and cancer drug response, demonstrating its robustness and broad applicability. Additionally, a case study on the SARS-CoV-2 Omicron variant highlights the potential of MMPCS in facilitating drug repurposing efforts. Availability and implementation The source code and datasets supporting this study are publicly available at GitHub (https://github.com/xmubiocode/MMPCS) and Zenodo (https://doi.org/10.5281/zenodo.18182748).

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper