August 20, 2017

Seq2seq Fingerprint

Key Points

Key points are not available for this paper at this time.

Abstract

Many of today's drug discoveries require expertise knowledge and insanely expensive biological experiments for identifying the chemical molecular properties. However, despite the growing interests of using supervised machine learning algorithms to automatically identify those chemical molecular properties, there is little advancement of the performance and accuracy due to the limited amount of training data. In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule. As a result, the continuous encoding vector is expected to contain rigorous and enough information to recover the original molecule and predict its chemical properties. The proposed embedding method could utilize almost unlimited molecule data for the training phase. With sufficient information encoded in the vector, the proposed method is also robust and task-insensitive. The performance and robustness are confirmed and interpreted in our extensive experiments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zheng Xu

Xinjiang University

Sheng Wang

Ningbo University

Feiyun Zhu

Zhejiang Center for Disease Control and Prevention

Actions

Institutions

The University of Texas at Arlington

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Seq2seq Fingerprint

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study