What question did this study set out to answer?

This research evaluates large language models' effectiveness in identifying known and novel drug combinations for Alzheimer's disease.

January 22, 2026

Comparative evaluation of large language models in retrieving known and predicting novel drug combinations

Key Points

This research evaluates large language models' effectiveness in identifying known and novel drug combinations for Alzheimer's disease.
Developed prompts for LLMs to retrieve drug combinations.
Evaluated performance against FDA-approved and PubMed-identified combinations.
Conducted pathway enrichment analyses with domain experts to explore potential mechanisms.
GPT-5 achieved 0.95 accuracy and balanced F1 score for FDA-approved drug combinations.
Top suggested combinations included FDA-approved pairs and those with clinical trial support.
Identified 10 off-label drug combinations targeting key AD-related biological pathways.

Abstract

Background Large language models (LLMs) are increasingly used in the biomedical field for information retrieval, information extraction and knowledge discovery. However, their potential in retrieving and discovering drug combinations for diseases remains underexplored. Objective This study aims to evaluate the effectiveness of LLMs in retrieving known drug combinations and to identify novel drug combinations for treating Alzheimer's disease (AD). Methods We developed a series of prompts to guide LLMs in retrieving drug combinations. Their performance was evaluated using both FDA-approved combinations and combinations identified through PubMed literature mining. We then assessed the feasibility of identifying novel drug combination candidates for AD. In collaboration with domain experts, we performed pathway enrichment analyses to evaluate their potential mechanisms of action within the context of AD. Results In a comparative evaluation of multiple LLMs, GPT-5 demonstrated the strongest overall performance, achieving an accuracy of 0.95 and a balanced F1 score of 0.95 in identifying FDA-approved drug combinations. Among the top 10 drug-combination candidates for AD treatment suggested by GPT-5, the combination of donepezil and memantine is already FDA-approved. Three other combinations have been tested in AD clinical trials, and three have supporting evidence in the literature. We also identified 10 off-label drug combinations, with pathway enrichment analyses indicating that several target key AD-related biological pathways. Conclusions LLMs is effective in retrieving drug combinations for a given disease and the performance varies among different language models with best performance for GPT-5. However, the suggestions from LLM models require further validation to be considered reliable.

AIに質問

Bookmark