Key points are not available for this paper at this time.
Many of today's drug discoveries require expertise knowledge and insanely expensive biological experiments for identifying the chemical molecular properties. However, despite the growing interests of using supervised machine learning algorithms to automatically identify those chemical molecular properties, there is little advancement of the performance and accuracy due to the limited amount of training data. In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule. As a result, the continuous encoding vector is expected to contain rigorous and enough information to recover the original molecule and predict its chemical properties. The proposed embedding method could utilize almost unlimited molecule data for the training phase. With sufficient information encoded in the vector, the proposed method is also robust and task-insensitive. The performance and robustness are confirmed and interpreted in our extensive experiments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zheng Xu
Xinjiang University
Sheng Wang
Ningbo University
Feiyun Zhu
Zhejiang Center for Disease Control and Prevention
The University of Texas at Arlington
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Sun,) studied this question.
synapsesocial.com/papers/69ded5a741e0955b99b0bd6b — DOI: https://doi.org/10.1145/3107411.3107424