Generating molecules with desired chemical properties is a crucial and promising area of research in drug discovery, as it has the potential to accelerate the identification of novel therapeutic compounds. Recent developments in diffusion models have showcased their remarkable generative capabilities, effectively handling continuous data modalities such as images and audio. However, when it comes to generating discrete data, particularly molecular representations like SMILES strings and molecular graphs, these models encounter significant challenges, especially in few-shot learning scenarios where only a limited number of samples are available. In this paper, we explore the potential of diffusion models for generating continuous representations of molecules-molecular images. Specifically, we propose ProtoDiff, a diffusion-based method that incorporates few-shot learning for molecular image generation. We frame molecular image generation as a few-shot controllable generation problem that extracts prototypes from a limited set of molecules to guide the generation process and introduces a novel sparsity regularization in the objective function of diffusion to emphasize the meaningful pixels of molecules, i.e., the limited pixels of the chemical bonds. We train and evaluate ProtoDiff on the ChEMBL dataset, achieving new state-of-the-art results on the majority of molecular generation tasks.
Peidong Liu (Wed,) studied this question.