Key points are not available for this paper at this time.
The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; querying substructures, calculating fingerprint similarity and calculating molecular descriptors. The source code for the PySpark-RDKit implementation is provided. The use cases showed that Spark provides a reasonable scalability depending on the use case and can be a suitable choice for datasets too big to be processed with current low-end workstations.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lovrić et al. (Thu,) studied this question.
synapsesocial.com/papers/69d9209bea2783c07da3c354 — DOI: https://doi.org/10.1002/minf.201800082
Mario Lovrić
University of Copenhagen
José Manuel Molero
Know Center Research GmbH (Austria)
Roman Kern
Graz University of Technology
Molecular Informatics
Know Center Research GmbH (Austria)
Children's Hospital Srebrnjak
Building similarity graph...
Analyzing shared references across papers
Loading...
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: