Abstract Molecular representation is fundamental to the field of cheminformatics, facilitating accurate prediction and exploration of molecular properties. Since the nineteenth century, methods for representing molecules have evolved significantly, with recent advances in deep learning offering state-of-the-art performance across various tasks. Among these, contrastive learning (CL) has emerged as one of the most powerful techniques for training deep learning models. CL aims to optimize the representation of similar molecules by reducing the distance between their vector embeddings, while simultaneously increasing the distance between dissimilar ones. Driven by the growing success of CL in enhancing representation learning, this paper presents the first comprehensive review dedicated to CL methods for molecular representation. We begin by surveying existing literature in the field, providing context for the evolution of molecular representation. Next, we introduce the core principles of the CL framework and examine its application to molecular representation learning tasks. Finally, we highlight the key challenges faced by CL-based approaches and discuss potential future directions for advancing molecular representation with these methods.
Forooghi et al. (Thu,) studied this question.