Computational approaches involving complex data structures (e.g. machine learning, knowledge graphs) have been more prominent in biological studies for the last two decades. Due to increasingly larger amounts of data collected with modern omics techniques, there is a need for methods that can process such data quickly and thoroughly. In addition, those techniques can be applied to extrapolate results from a limited number of observations. Rare disease research benefits particularly from those new computational approaches as each rare disease affects a small percentage of the population. Nevertheless, finding effective treatments benefits a wide portion of the world’s individuals if measured in absolute numbers: 10% of the whole world population is affected by rare diseases as a whole. In the context of rare diseases, drug repurposing (i.e. testing existing approved drugs against other diseases) stands as a viable alternative to traditional drug discovery—thus reducing costs compared to novel drug discovery. We introduce a novel approach for initial candidate drugs selection which is based on a knowledge graph of biological associations between genes involved in the disease and drugs from experimental and clinical databases. Additionally, our approach generates semantically valid negative samples to further improve the selection of candidate drugs. We tested it on Huntington’s disease, a model condition for rare disease research. Our main contribution is that the approach we introduce in this paper does not require human-curated datasets, resulting in a scalable drug repurposing workflow that leverages information on known and missing associations between gene and drugs to predict candidate repurposed drugs—while implementing strategies that limit hardware resource consumptions, hence reducing computing time.
Bianchi et al. (Wed,) studied this question.