What question did this study set out to answer?

This research aims to develop methods for drug repurposing prediction in rare disease contexts using computational techniques.

February 7, 2026Open Access

Design and evaluation of semantically-valid negative samples integration techniques for scalable semi-automated drug repurposing prediction pipelines in rare disease research

Key Points

This research aims to develop methods for drug repurposing prediction in rare disease contexts using computational techniques.
Developed semantic integration techniques for negative samples in drug repurposing
Applied knowledge graphs to identify biological associations between genes and drugs
Tested the approach primarily on Huntington’s disease
Introduced scalable drug repurposing workflows that do not require human-curated datasets
Demonstrated improved candidate drug selection efficiency
Reduced computing time and hardware resource consumption

Abstract

Computational approaches involving complex data structures (e.g. machine learning, knowledge graphs) have been more prominent in biological studies for the last two decades. Due to increasingly larger amounts of data collected with modern omics techniques, there is a need for methods that can process such data quickly and thoroughly. In addition, those techniques can be applied to extrapolate results from a limited number of observations. Rare disease research benefits particularly from those new computational approaches as each rare disease affects a small percentage of the population. Nevertheless, finding effective treatments benefits a wide portion of the world’s individuals if measured in absolute numbers: 10% of the whole world population is affected by rare diseases as a whole. In the context of rare diseases, drug repurposing (i.e. testing existing approved drugs against other diseases) stands as a viable alternative to traditional drug discovery—thus reducing costs compared to novel drug discovery. We introduce a novel approach for initial candidate drugs selection which is based on a knowledge graph of biological associations between genes involved in the disease and drugs from experimental and clinical databases. Additionally, our approach generates semantically valid negative samples to further improve the selection of candidate drugs. We tested it on Huntington’s disease, a model condition for rare disease research. Our main contribution is that the approach we introduce in this paper does not require human-curated datasets, resulting in a scalable drug repurposing workflow that leverages information on known and missing associations between gene and drugs to predict candidate repurposed drugs—while implementing strategies that limit hardware resource consumptions, hence reducing computing time.

Design and evaluation of semantically-valid negative samples integration techniques for scalable semi-automated drug repurposing prediction pipelines in rare disease research

Key Points

Abstract

Cite This Study