To address the persistent challenges in integrating multimodal molecular data for property prediction, we propose the Multimodal Graph Neural Network (MMGNN). This novel framework synergistically optimizes molecular representations by coupling dual heterogeneous graph encoders-designed to capture local atomic interactions and global topological semantics-with a bidirectional cross-view attention module. This module dynamically aligns continuous structural latent spaces with discrete fingerprint features, while an adaptive gated fusion mechanism integrates these multiscale representations. Furthermore, contrastive pre-training using normalized temperature-scaled cross-entropy (NT-Xent) loss enforces robust, invariant feature learning. Extensive empirical evaluations demonstrate MMGNN's superior performance in advancing computational drug discovery.
Liu et al. (Sat,) studied this question.